Overview

Dataset statistics

Number of variables8
Number of observations980
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory61.4 KiB
Average record size in memory64.1 B

Variable types

Numeric6
Categorical2

Warnings

samplingrate has constant value "22050" Constant
speacker is highly correlated with samplingrateHigh correlation
samplingrate is highly correlated with speackerHigh correlation
speacker is uniformly distributed Uniform
silencePercent has 47 (4.8%) zeros Zeros

Reproduction

Analysis started2021-05-11 14:21:58.463371
Analysis finished2021-05-11 14:22:03.063370
Duration4.6 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

Distinct105
Distinct (%)10.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.29183673
Minimum1
Maximum105
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2021-05-11T16:22:03.133382image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q112
median26
Q347
95-th percentile83
Maximum105
Range104
Interquartile range (IQR)35

Descriptive statistics

Standard deviation25.43574361
Coefficient of variation (CV)0.7876833955
Kurtosis-0.07529870321
Mean32.29183673
Median Absolute Deviation (MAD)16
Skewness0.8835091218
Sum31646
Variance646.9770528
MonotocityNot monotonic
2021-05-11T16:22:03.255382image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
122
 
2.2%
222
 
2.2%
322
 
2.2%
422
 
2.2%
522
 
2.2%
622
 
2.2%
722
 
2.2%
822
 
2.2%
922
 
2.2%
1022
 
2.2%
Other values (95)760
77.6%
ValueCountFrequency (%)
122
2.2%
222
2.2%
322
2.2%
422
2.2%
522
2.2%
ValueCountFrequency (%)
1052
0.2%
1042
0.2%
1032
0.2%
1022
0.2%
1012
0.2%

duration
Real number (ℝ≥0)

Distinct172
Distinct (%)17.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.651428571
Minimum0.9
Maximum30
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2021-05-11T16:22:03.382382image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0.9
5-th percentile2.9
Q14.9
median6.9
Q39.4
95-th percentile15.2
Maximum30
Range29.1
Interquartile range (IQR)4.5

Descriptive statistics

Standard deviation4.016862821
Coefficient of variation (CV)0.5249820715
Kurtosis4.597320147
Mean7.651428571
Median Absolute Deviation (MAD)2.2
Skewness1.633594192
Sum7498.4
Variance16.13518693
MonotocityNot monotonic
2021-05-11T16:22:03.499362image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.419
 
1.9%
5.717
 
1.7%
6.717
 
1.7%
3.816
 
1.6%
6.515
 
1.5%
8.815
 
1.5%
515
 
1.5%
4.915
 
1.5%
6.414
 
1.4%
7.214
 
1.4%
Other values (162)823
84.0%
ValueCountFrequency (%)
0.91
 
0.1%
11
 
0.1%
1.21
 
0.1%
1.33
0.3%
1.41
 
0.1%
ValueCountFrequency (%)
301
0.1%
29.91
0.1%
29.81
0.1%
26.51
0.1%
26.21
0.1%

loudness
Real number (ℝ)

Distinct59
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-21.28469388
Minimum-28.2
Maximum-17.8
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2021-05-11T16:22:03.632381image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-28.2
5-th percentile-22.7
Q1-21.8
median-21.2
Q3-20.7
95-th percentile-19.9
Maximum-17.8
Range10.4
Interquartile range (IQR)1.1

Descriptive statistics

Standard deviation0.9062096573
Coefficient of variation (CV)-0.04257564908
Kurtosis5.111762064
Mean-21.28469388
Median Absolute Deviation (MAD)0.5
Skewness-0.8106417881
Sum-20859
Variance0.821215943
MonotocityNot monotonic
2021-05-11T16:22:03.754383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2153
 
5.4%
-21.350
 
5.1%
-21.849
 
5.0%
-20.947
 
4.8%
-21.645
 
4.6%
-20.645
 
4.6%
-21.145
 
4.6%
-21.244
 
4.5%
-21.743
 
4.4%
-20.542
 
4.3%
Other values (49)517
52.8%
ValueCountFrequency (%)
-28.21
0.1%
-26.31
0.1%
-261
0.1%
-24.41
0.1%
-24.12
0.2%
ValueCountFrequency (%)
-17.81
0.1%
-18.51
0.1%
-18.61
0.1%
-18.71
0.1%
-18.81
0.1%

minSilenceDB
Real number (ℝ)

Distinct54
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-52.39285714
Minimum-69
Maximum-10
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2021-05-11T16:22:03.872381image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-69
5-th percentile-64
Q1-60
median-57
Q3-44
95-th percentile-24
Maximum-10
Range59
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.96691686
Coefficient of variation (CV)-0.2284074111
Kurtosis1.112553649
Mean-52.39285714
Median Absolute Deviation (MAD)4
Skewness1.372621475
Sum-51345
Variance143.2070991
MonotocityNot monotonic
2021-05-11T16:22:03.991386image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-5983
 
8.5%
-5882
 
8.4%
-5774
 
7.6%
-5666
 
6.7%
-6060
 
6.1%
-6159
 
6.0%
-5556
 
5.7%
-6256
 
5.7%
-6341
 
4.2%
-5436
 
3.7%
Other values (44)367
37.4%
ValueCountFrequency (%)
-691
 
0.1%
-681
 
0.1%
-678
 
0.8%
-6610
1.0%
-6521
2.1%
ValueCountFrequency (%)
-102
 
0.2%
-132
 
0.2%
-142
 
0.2%
-175
0.5%
-183
0.3%

samplingrate
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
22050
980 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters4900
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row22050
2nd row22050
3rd row22050
4th row22050
5th row22050
ValueCountFrequency (%)
22050980
100.0%
2021-05-11T16:22:04.206384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-11T16:22:04.271374image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
22050980
100.0%

Most occurring characters

ValueCountFrequency (%)
21960
40.0%
01960
40.0%
5980
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4900
100.0%

Most frequent character per category

ValueCountFrequency (%)
21960
40.0%
01960
40.0%
5980
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common4900
100.0%

Most frequent character per script

ValueCountFrequency (%)
21960
40.0%
01960
40.0%
5980
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4900
100.0%

Most frequent character per block

ValueCountFrequency (%)
21960
40.0%
01960
40.0%
5980
20.0%

silencePercent
Real number (ℝ≥0)

ZEROS

Distinct63
Distinct (%)6.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.07857143
Minimum0
Maximum63
Zeros47
Zeros (%)4.8%
Memory size7.8 KiB
2021-05-11T16:22:04.343383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q17
median17
Q326
95-th percentile49
Maximum63
Range63
Interquartile range (IQR)19

Descriptive statistics

Standard deviation14.38053353
Coefficient of variation (CV)0.7537531615
Kurtosis0.1894858692
Mean19.07857143
Median Absolute Deviation (MAD)9
Skewness0.7961537505
Sum18697
Variance206.7997446
MonotocityNot monotonic
2021-05-11T16:22:04.459383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
047
 
4.8%
245
 
4.6%
343
 
4.4%
1643
 
4.4%
1440
 
4.1%
136
 
3.7%
1936
 
3.7%
435
 
3.6%
1533
 
3.4%
1831
 
3.2%
Other values (53)591
60.3%
ValueCountFrequency (%)
047
4.8%
136
3.7%
245
4.6%
343
4.4%
435
3.6%
ValueCountFrequency (%)
631
 
0.1%
623
0.3%
611
 
0.1%
591
 
0.1%
585
0.5%

averageFrequency
Real number (ℝ≥0)

Distinct604
Distinct (%)61.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2270.718367
Minimum1073
Maximum3167
Zeros0
Zeros (%)0.0%
Memory size7.8 KiB
2021-05-11T16:22:04.594383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1073
5-th percentile1837.9
Q12112
median2265
Q32428.25
95-th percentile2690.05
Maximum3167
Range2094
Interquartile range (IQR)316.25

Descriptive statistics

Standard deviation253.5593772
Coefficient of variation (CV)0.1116648286
Kurtosis0.7945148303
Mean2270.718367
Median Absolute Deviation (MAD)158
Skewness-0.03430560005
Sum2225304
Variance64292.35778
MonotocityNot monotonic
2021-05-11T16:22:04.708383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21288
 
0.8%
22217
 
0.7%
22145
 
0.5%
22235
 
0.5%
20905
 
0.5%
21005
 
0.5%
21385
 
0.5%
20475
 
0.5%
23695
 
0.5%
21345
 
0.5%
Other values (594)925
94.4%
ValueCountFrequency (%)
10731
0.1%
13781
0.1%
15761
0.1%
15791
0.1%
15971
0.1%
ValueCountFrequency (%)
31671
0.1%
31131
0.1%
30971
0.1%
29981
0.1%
29731
0.1%

speacker
Categorical

HIGH CORRELATION
UNIFORM

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
clean
490 
full
490 

Length

Max length5
Median length4.5
Mean length4.5
Min length4

Characters and Unicode

Total characters4410
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowclean
2nd rowclean
3rd rowclean
4th rowclean
5th rowclean
ValueCountFrequency (%)
clean490
50.0%
full490
50.0%
2021-05-11T16:22:04.935367image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-11T16:22:05.007375image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
clean490
50.0%
full490
50.0%

Most occurring characters

ValueCountFrequency (%)
l1470
33.3%
c490
 
11.1%
e490
 
11.1%
a490
 
11.1%
n490
 
11.1%
f490
 
11.1%
u490
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4410
100.0%

Most frequent character per category

ValueCountFrequency (%)
l1470
33.3%
c490
 
11.1%
e490
 
11.1%
a490
 
11.1%
n490
 
11.1%
f490
 
11.1%
u490
 
11.1%

Most occurring scripts

ValueCountFrequency (%)
Latin4410
100.0%

Most frequent character per script

ValueCountFrequency (%)
l1470
33.3%
c490
 
11.1%
e490
 
11.1%
a490
 
11.1%
n490
 
11.1%
f490
 
11.1%
u490
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4410
100.0%

Most frequent character per block

ValueCountFrequency (%)
l1470
33.3%
c490
 
11.1%
e490
 
11.1%
a490
 
11.1%
n490
 
11.1%
f490
 
11.1%
u490
 
11.1%

Interactions

2021-05-11T16:21:58.816368image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:58.972361image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.094385image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.219383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.343385image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.461375image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.595363image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.725384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.853385image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:21:59.981374image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.107383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.227374image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.352362image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.478385image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.603386image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.722383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.848387image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:00.980376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.106364image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.233384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.355383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.479386image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.605378image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.731385image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.856385image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:01.979365image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:02.097444image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:02.216373image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:02.340384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:02.461365image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-05-11T16:22:05.076371image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-11T16:22:05.258389image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-11T16:22:05.595383image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-11T16:22:05.773384image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-11T16:22:05.923373image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-11T16:22:02.651365image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-11T16:22:02.978387image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

iddurationloudnessminSilenceDBsamplingratesilencePercentaverageFrequencyspeacker
012.1-20.8-6122050423167clean
126.2-20.9-6122050462351clean
234.1-21.4-5922050232010clean
345.0-18.9-6522050252134clean
453.1-22.8-6122050442221clean
565.1-20.8-6222050412261clean
676.5-23.2-5322050212132clean
786.1-22.5-5822050172690clean
892.4-21.7-5922050402260clean
91012.6-21.1-5822050152110clean

Last rows

iddurationloudnessminSilenceDBsamplingratesilencePercentaverageFrequencyspeacker
97068.9-21.6-6222050222044full
97172.7-20.4-372205072213full
972811.8-21.0-5822050572294full
973914.2-20.3-5822050162007full
974108.9-20.7-6422050192182full
975119.0-22.0-6222050112241full
976128.0-24.1-5922050252428full
977139.3-21.2-5622050332308full
978144.9-21.6-442205062173full
979157.7-20.6-5822050132219full