Overview

Dataset statistics

Number of variables8
Number of observations490
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.8 KiB
Average record size in memory64.3 B

Variable types

Numeric6
Categorical2

Warnings

samplingrate has constant value "22050" Constant
speacker has constant value "clean" Constant
speacker is highly correlated with samplingrateHigh correlation
samplingrate is highly correlated with speackerHigh correlation

Reproduction

Analysis started2021-05-11 14:22:10.736420
Analysis finished2021-05-11 14:22:17.462310
Duration6.73 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

Distinct105
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.29183673
Minimum1
Maximum105
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:17.523311image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q112
median26
Q347
95-th percentile83
Maximum105
Range104
Interquartile range (IQR)35

Descriptive statistics

Standard deviation25.44874424
Coefficient of variation (CV)0.7880859937
Kurtosis-0.06951867365
Mean32.29183673
Median Absolute Deviation (MAD)16
Skewness0.8848673208
Sum15823
Variance647.6385835
MonotocityNot monotonic
2021-05-11T16:22:17.643641image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
111
 
2.2%
211
 
2.2%
311
 
2.2%
411
 
2.2%
511
 
2.2%
611
 
2.2%
711
 
2.2%
811
 
2.2%
911
 
2.2%
1011
 
2.2%
Other values (95)380
77.6%
ValueCountFrequency (%)
111
2.2%
211
2.2%
311
2.2%
411
2.2%
511
2.2%
ValueCountFrequency (%)
1051
0.2%
1041
0.2%
1031
0.2%
1021
0.2%
1011
0.2%

duration
Real number (ℝ≥0)

Distinct123
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.090204082
Minimum1.2
Maximum29.8
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:17.764639image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1.2
5-th percentile2.945
Q14.8
median6.7
Q38.9
95-th percentile12.8
Maximum29.8
Range28.6
Interquartile range (IQR)4.1

Descriptive statistics

Standard deviation3.172096189
Coefficient of variation (CV)0.4473913801
Kurtosis4.921752254
Mean7.090204082
Median Absolute Deviation (MAD)2.1
Skewness1.249630479
Sum3474.2
Variance10.06219423
MonotocityNot monotonic
2021-05-11T16:22:18.029641image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
511
 
2.2%
5.711
 
2.2%
6.711
 
2.2%
5.410
 
2.0%
4.29
 
1.8%
5.99
 
1.8%
68
 
1.6%
3.88
 
1.6%
6.48
 
1.6%
88
 
1.6%
Other values (113)397
81.0%
ValueCountFrequency (%)
1.21
0.2%
1.31
0.2%
1.51
0.2%
1.81
0.2%
2.12
0.4%
ValueCountFrequency (%)
29.81
 
0.2%
17.51
 
0.2%
171
 
0.2%
16.31
 
0.2%
15.73
0.6%

loudness
Real number (ℝ)

Distinct54
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-21.37142857
Minimum-28.2
Maximum-17.8
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:18.158624image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-28.2
5-th percentile-22.8
Q1-21.9
median-21.3
Q3-20.8
95-th percentile-20
Maximum-17.8
Range10.4
Interquartile range (IQR)1.1

Descriptive statistics

Standard deviation0.9322664074
Coefficient of variation (CV)-0.04362209126
Kurtosis6.062766555
Mean-21.37142857
Median Absolute Deviation (MAD)0.6
Skewness-0.7373414343
Sum-10472
Variance0.8691206544
MonotocityNot monotonic
2021-05-11T16:22:18.281639image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-20.726
 
5.3%
-21.626
 
5.3%
-21.825
 
5.1%
-21.223
 
4.7%
-21.423
 
4.7%
-2123
 
4.7%
-20.623
 
4.7%
-21.322
 
4.5%
-20.922
 
4.5%
-21.120
 
4.1%
Other values (44)257
52.4%
ValueCountFrequency (%)
-28.21
0.2%
-24.41
0.2%
-24.11
0.2%
-23.81
0.2%
-23.71
0.2%
ValueCountFrequency (%)
-17.81
0.2%
-18.61
0.2%
-18.71
0.2%
-18.81
0.2%
-18.91
0.2%

minSilenceDB
Real number (ℝ)

Distinct19
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-59.53061224
Minimum-69
Maximum-39
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:18.390640image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-69
5-th percentile-65
Q1-62
median-59
Q3-57
95-th percentile-54.45
Maximum-39
Range30
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.472368526
Coefficient of variation (CV)-0.05832912506
Kurtosis3.149184166
Mean-59.53061224
Median Absolute Deviation (MAD)2
Skewness0.5688538134
Sum-29170
Variance12.05734318
MonotocityNot monotonic
2021-05-11T16:22:18.482624image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
-5958
11.8%
-5856
11.4%
-6048
9.8%
-5746
9.4%
-6146
9.4%
-6243
8.8%
-5640
8.2%
-6337
7.6%
-6429
5.9%
-5527
5.5%
Other values (9)60
12.2%
ValueCountFrequency (%)
-691
 
0.2%
-677
 
1.4%
-669
 
1.8%
-6518
3.7%
-6429
5.9%
ValueCountFrequency (%)
-391
 
0.2%
-411
 
0.2%
-491
 
0.2%
-535
 
1.0%
-5417
3.5%

samplingrate
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
22050
490 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters2450
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row22050
2nd row22050
3rd row22050
4th row22050
5th row22050
ValueCountFrequency (%)
22050490
100.0%
2021-05-11T16:22:18.682629image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-11T16:22:18.747632image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
22050490
100.0%

Most occurring characters

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2450
100.0%

Most frequent character per category

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common2450
100.0%

Most frequent character per script

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2450
100.0%

Most frequent character per block

ValueCountFrequency (%)
2980
40.0%
0980
40.0%
5490
20.0%

silencePercent
Real number (ℝ≥0)

Distinct49
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.89183673
Minimum8
Maximum62
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:18.817631image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile11.45
Q116
median20
Q327
95-th percentile44
Maximum62
Range54
Interquartile range (IQR)11

Descriptive statistics

Standard deviation10.23339768
Coefficient of variation (CV)0.4470326169
Kurtosis1.402915108
Mean22.89183673
Median Absolute Deviation (MAD)5
Skewness1.276319318
Sum11217
Variance104.7224281
MonotocityNot monotonic
2021-05-11T16:22:18.942626image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
1633
 
6.7%
1932
 
6.5%
1431
 
6.3%
1825
 
5.1%
1723
 
4.7%
1523
 
4.7%
2323
 
4.7%
1323
 
4.7%
2022
 
4.5%
2420
 
4.1%
Other values (39)235
48.0%
ValueCountFrequency (%)
84
 
0.8%
92
 
0.4%
107
 
1.4%
1112
2.4%
1218
3.7%
ValueCountFrequency (%)
621
 
0.2%
581
 
0.2%
571
 
0.2%
563
0.6%
552
0.4%

averageFrequency
Real number (ℝ≥0)

Distinct376
Distinct (%)76.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2243.112245
Minimum1576
Maximum3167
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB
2021-05-11T16:22:19.066642image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1576
5-th percentile1850.2
Q12089.25
median2234.5
Q32393
95-th percentile2643.25
Maximum3167
Range1591
Interquartile range (IQR)303.75

Descriptive statistics

Standard deviation239.5230555
Coefficient of variation (CV)0.1067815737
Kurtosis0.365312195
Mean2243.112245
Median Absolute Deviation (MAD)149
Skewness0.1706018303
Sum1099125
Variance57371.29412
MonotocityNot monotonic
2021-05-11T16:22:19.186641image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20905
 
1.0%
21424
 
0.8%
22214
 
0.8%
23683
 
0.6%
21363
 
0.6%
21343
 
0.6%
21963
 
0.6%
22143
 
0.6%
21253
 
0.6%
22333
 
0.6%
Other values (366)456
93.1%
ValueCountFrequency (%)
15761
0.2%
15791
0.2%
16481
0.2%
16591
0.2%
16681
0.2%
ValueCountFrequency (%)
31671
0.2%
29711
0.2%
29301
0.2%
29261
0.2%
28431
0.2%

speacker
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
clean
490 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters2450
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowclean
2nd rowclean
3rd rowclean
4th rowclean
5th rowclean
ValueCountFrequency (%)
clean490
100.0%
2021-05-11T16:22:19.405631image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-11T16:22:19.473643image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
clean490
100.0%

Most occurring characters

ValueCountFrequency (%)
c490
20.0%
l490
20.0%
e490
20.0%
a490
20.0%
n490
20.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2450
100.0%

Most frequent character per category

ValueCountFrequency (%)
c490
20.0%
l490
20.0%
e490
20.0%
a490
20.0%
n490
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2450
100.0%

Most frequent character per script

ValueCountFrequency (%)
c490
20.0%
l490
20.0%
e490
20.0%
a490
20.0%
n490
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2450
100.0%

Most frequent character per block

ValueCountFrequency (%)
c490
20.0%
l490
20.0%
e490
20.0%
a490
20.0%
n490
20.0%

Interactions

2021-05-11T16:22:12.321256image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:13.458826image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:13.827834image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:13.936836image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.041825image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.160834image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.268836image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.379820image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.493824image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.603819image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.723835image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.829813image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:14.939809image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.052812image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.161834image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.283845image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.402822image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.523834image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.635833image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.749835image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.878828image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:15.989822image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.103835image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.210833image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.323824image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.443833image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.561836image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.683822image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.802836image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-05-11T16:22:16.925965image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-05-11T16:22:19.532631image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-11T16:22:19.708633image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-11T16:22:19.888642image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-11T16:22:20.062641image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-11T16:22:20.202641image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-11T16:22:17.146490image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-11T16:22:17.384303image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

iddurationloudnessminSilenceDBsamplingratesilencePercentaverageFrequencyspeacker
012.1-20.8-6122050423167clean
126.2-20.9-6122050462351clean
234.1-21.4-5922050232010clean
345.0-18.9-6522050252134clean
453.1-22.8-6122050442221clean
565.1-20.8-6222050412261clean
676.5-23.2-5322050212132clean
786.1-22.5-5822050172690clean
892.4-21.7-5922050402260clean
91012.6-21.1-5822050152110clean

Last rows

iddurationloudnessminSilenceDBsamplingratesilencePercentaverageFrequencyspeacker
48067.7-21.4-6522050111961clean
48172.2-22.3-5522050242134clean
48285.8-21.7-6122050182106clean
483917.0-21.8-6022050182142clean
484108.5-21.8-5822050162134clean
485119.5-22.3-5622050222044clean
486129.4-21.5-5822050352309clean
487134.5-21.6-6122050222049clean
488144.8-22.1-6122050282125clean
489157.4-22.8-5922050142005clean