Overview

Dataset statistics

Number of variables8
Number of observations490
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.8 KiB
Average record size in memory64.3 B

Variable types

Numeric6
Categorical2

Warnings

samplingrate has constant value "22050" Constant
speacker has constant value "full" Constant
speacker is highly correlated with samplingrateHigh correlation
samplingrate is highly correlated with speackerHigh correlation
silencePercent has 47 (9.6%) zeros Zeros

Reproduction

Analysis started2021-05-11 14:22:24.825059
Analysis finished2021-05-11 14:22:30.677655
Duration5.85 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

Distinct105
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.29183673
Minimum1
Maximum105
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:30.739693image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q112
median26
Q347
95-th percentile83
Maximum105
Range104
Interquartile range (IQR)35

Descriptive statistics

Standard deviation25.44874424
Coefficient of variation (CV)0.7880859937
Kurtosis-0.06951867365
Mean32.29183673
Median Absolute Deviation (MAD)16
Skewness0.8848673208
Sum15823
Variance647.6385835
MonotocityNot monotonic
2021-05-11T16:22:30.874679image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
111
 
2.2%
211
 
2.2%
311
 
2.2%
411
 
2.2%
511
 
2.2%
611
 
2.2%
711
 
2.2%
811
 
2.2%
911
 
2.2%
1011
 
2.2%
Other values (95)380
77.6%
ValueCountFrequency (%)
111
2.2%
211
2.2%
311
2.2%
411
2.2%
511
2.2%
ValueCountFrequency (%)
1051
0.2%
1041
0.2%
1031
0.2%
1021
0.2%
1011
0.2%

duration
Real number (ℝ≥0)

Distinct155
Distinct (%)31.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.212653061
Minimum0.9
Maximum30
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:31.016498image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0.9
5-th percentile2.945
Q14.9
median7.2
Q39.8
95-th percentile17.455
Maximum30
Range29.1
Interquartile range (IQR)4.9

Descriptive statistics

Standard deviation4.648649462
Coefficient of variation (CV)0.5660350471
Kurtosis3.12376341
Mean8.212653061
Median Absolute Deviation (MAD)2.4
Skewness1.523021986
Sum4024.2
Variance21.60994182
MonotocityNot monotonic
2021-05-11T16:22:31.146865image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.911
 
2.2%
6.511
 
2.2%
8.810
 
2.0%
7.29
 
1.8%
3.59
 
1.8%
5.49
 
1.8%
6.88
 
1.6%
4.48
 
1.6%
3.88
 
1.6%
5.37
 
1.4%
Other values (145)400
81.6%
ValueCountFrequency (%)
0.91
0.2%
11
0.2%
1.32
0.4%
1.41
0.2%
1.92
0.4%
ValueCountFrequency (%)
301
0.2%
29.91
0.2%
26.51
0.2%
26.21
0.2%
24.82
0.4%

loudness
Real number (ℝ)

Distinct48
Distinct (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-21.19795918
Minimum-26.3
Maximum-18.5
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:31.449507image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-26.3
5-th percentile-22.5
Q1-21.7
median-21.1
Q3-20.6
95-th percentile-19.9
Maximum-18.5
Range7.8
Interquartile range (IQR)1.1

Descriptive statistics

Standard deviation0.8717304784
Coefficient of variation (CV)-0.04112332092
Kurtosis4.069988316
Mean-21.19795918
Median Absolute Deviation (MAD)0.5
Skewness-0.8820313449
Sum-10387
Variance0.759914027
MonotocityNot monotonic
2021-05-11T16:22:31.568492image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
-2130
 
6.1%
-21.328
 
5.7%
-21.125
 
5.1%
-20.525
 
5.1%
-20.925
 
5.1%
-21.824
 
4.9%
-20.824
 
4.9%
-21.723
 
4.7%
-21.522
 
4.5%
-20.622
 
4.5%
Other values (38)242
49.4%
ValueCountFrequency (%)
-26.31
0.2%
-261
0.2%
-24.11
0.2%
-23.91
0.2%
-23.81
0.2%
ValueCountFrequency (%)
-18.51
0.2%
-18.91
0.2%
-19.12
0.4%
-19.31
0.2%
-19.42
0.4%

minSilenceDB
Real number (ℝ)

Distinct53
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-45.25510204
Minimum-68
Maximum-10
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:31.693505image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-68
5-th percentile-62
Q1-57
median-44.5
Q3-38.25
95-th percentile-21
Maximum-10
Range58
Interquartile range (IQR)18.75

Descriptive statistics

Standard deviation13.13568581
Coefficient of variation (CV)-0.2902586719
Kurtosis-0.6053531101
Mean-45.25510204
Median Absolute Deviation (MAD)11.5
Skewness0.5491350529
Sum-22175
Variance172.5462418
MonotocityNot monotonic
2021-05-11T16:22:31.805495image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-5529
 
5.9%
-5728
 
5.7%
-5626
 
5.3%
-5826
 
5.3%
-4226
 
5.3%
-5925
 
5.1%
-4422
 
4.5%
-4321
 
4.3%
-5419
 
3.9%
-3919
 
3.9%
Other values (43)249
50.8%
ValueCountFrequency (%)
-681
 
0.2%
-671
 
0.2%
-661
 
0.2%
-653
0.6%
-645
1.0%
ValueCountFrequency (%)
-102
 
0.4%
-132
 
0.4%
-142
 
0.4%
-175
1.0%
-183
0.6%

samplingrate
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
22050
490 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters2450
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row22050
2nd row22050
3rd row22050
4th row22050
5th row22050
ValueCountFrequency (%)
22050490
100.0%
2021-05-11T16:22:32.037529image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-11T16:22:32.106520image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
22050490
100.0%

Most occurring characters

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2450
100.0%

Most frequent character per category

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common2450
100.0%

Most frequent character per script

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2450
100.0%

Most frequent character per block

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

silencePercent
Real number (ℝ≥0)

ZEROS

Distinct61
Distinct (%)12.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.26530612
Minimum0
Maximum63
Zeros47
Zeros (%)9.6%
Memory size4.0 KiB
2021-05-11T16:22:32.180518image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median7
Q325
95-th percentile51
Maximum63
Range63
Interquartile range (IQR)23

Descriptive statistics

Standard deviation16.73793618
Coefficient of variation (CV)1.096469081
Kurtosis0.2254685567
Mean15.26530612
Median Absolute Deviation (MAD)6
Skewness1.150986741
Sum7480
Variance280.1585076
MonotocityNot monotonic
2021-05-11T16:22:32.307518image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
047
 
9.6%
245
 
9.2%
343
 
8.8%
136
 
7.3%
435
 
7.1%
520
 
4.1%
614
 
2.9%
1111
 
2.2%
1211
 
2.2%
1610
 
2.0%
Other values (51)218
44.5%
ValueCountFrequency (%)
047
9.6%
136
7.3%
245
9.2%
343
8.8%
435
7.1%
ValueCountFrequency (%)
631
 
0.2%
622
0.4%
611
 
0.2%
591
 
0.2%
584
0.8%

averageFrequency
Real number (ℝ≥0)

Distinct380
Distinct (%)77.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2298.32449
Minimum1073
Maximum3113
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:32.449644image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1073
5-th percentile1834.15
Q12143.25
median2293
Q32458.75
95-th percentile2727.85
Maximum3113
Range2040
Interquartile range (IQR)315.5

Descriptive statistics

Standard deviation264.2301756
Coefficient of variation (CV)0.1149664361
Kurtosis1.229345818
Mean2298.32449
Median Absolute Deviation (MAD)159
Skewness-0.2440008458
Sum1126179
Variance69817.5857
MonotocityNot monotonic
2021-05-11T16:22:32.578152image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21286
 
1.2%
25654
 
0.8%
23944
 
0.8%
23694
 
0.8%
22633
 
0.6%
21003
 
0.6%
22233
 
0.6%
23613
 
0.6%
22213
 
0.6%
22843
 
0.6%
Other values (370)454
92.7%
ValueCountFrequency (%)
10731
0.2%
13781
0.2%
15971
0.2%
16011
0.2%
16151
0.2%
ValueCountFrequency (%)
31131
0.2%
30971
0.2%
29981
0.2%
29731
0.2%
29421
0.2%

speacker
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
full
490 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1960
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfull
2nd rowfull
3rd rowfull
4th rowfull
5th rowfull
ValueCountFrequency (%)
full490
100.0%
2021-05-11T16:22:32.811523image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-11T16:22:32.878560image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
full490
100.0%

Most occurring characters

ValueCountFrequency (%)
l980
50.0%
f490
25.0%
u490
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1960
100.0%

Most frequent character per category

ValueCountFrequency (%)
l980
50.0%
f490
25.0%
u490
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1960
100.0%

Most frequent character per script

ValueCountFrequency (%)
l980
50.0%
f490
25.0%
u490
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1960
100.0%

Most frequent character per block

ValueCountFrequency (%)
l980
50.0%
f490
25.0%
u490
25.0%

Interactions

2021-05-11T16:22:25.202053image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:26.396925image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:26.514936image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:26.626924image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:26.736930image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.135917image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.245933image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.369537image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.488554image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.603580image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.729720image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.859728image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:27.984742image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.108740image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.231276image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.368336image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.487337image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.605353image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.727339image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.850344image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:28.979998image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.102999image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.222999image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.345006image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.464910image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.589917image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.714926image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:29.842007image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:30.000990image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:30.170990image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-05-11T16:22:32.938560image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-11T16:22:33.122996image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-11T16:22:33.317313image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-11T16:22:33.508314image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-11T16:22:33.657326image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-11T16:22:30.391989image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-11T16:22:30.593118image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

iddurationloudnessminSilenceDBsamplingratesilencePercentaverageFrequencyspeacker
012.6-26.0-2922050202271full
125.5-21.9-6122050432612full
235.1-21.6-422205062014full
349.3-19.5-5822050582221full
454.6-22.3-5422050632495full
564.8-20.4-5722050342436full
677.6-21.8-242205011904full
786.5-19.5-422205032643full
893.0-22.9-442205051946full
91013.2-21.4-6122050152229full

Last rows

iddurationloudnessminSilenceDBsamplingratesilencePercentaverageFrequencyspeacker
48068.9-21.6-6222050222044full
48172.7-20.4-372205072213full
482811.8-21.0-5822050572294full
483914.2-20.3-5822050162007full
484108.9-20.7-6422050192182full
485119.0-22.0-6222050112241full
486128.0-24.1-5922050252428full
487139.3-21.2-5622050332308full
488144.9-21.6-442205062173full
489157.7-20.6-5822050132219full