Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 490 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 30.8 KiB |
Average record size in memory | 64.3 B |
Variable types
Numeric | 6 |
---|---|
Categorical | 2 |
samplingrate has constant value "22050" | Constant |
speacker has constant value "clean" | Constant |
speacker is highly correlated with samplingrate | High correlation |
samplingrate is highly correlated with speacker | High correlation |
Reproduction
Analysis started | 2021-05-11 14:22:10.736420 |
---|---|
Analysis finished | 2021-05-11 14:22:17.462310 |
Duration | 6.73 seconds |
Software version | pandas-profiling v2.11.0 |
Download configuration | config.yaml |
id
Real number (ℝ≥0)
Distinct | 105 |
---|---|
Distinct (%) | 21.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 32.29183673 |
---|---|
Minimum | 1 |
Maximum | 105 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 3 |
Q1 | 12 |
median | 26 |
Q3 | 47 |
95-th percentile | 83 |
Maximum | 105 |
Range | 104 |
Interquartile range (IQR) | 35 |
Descriptive statistics
Standard deviation | 25.44874424 |
---|---|
Coefficient of variation (CV) | 0.7880859937 |
Kurtosis | -0.06951867365 |
Mean | 32.29183673 |
Median Absolute Deviation (MAD) | 16 |
Skewness | 0.8848673208 |
Sum | 15823 |
Variance | 647.6385835 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
1 | 11 | 2.2% |
2 | 11 | 2.2% |
3 | 11 | 2.2% |
4 | 11 | 2.2% |
5 | 11 | 2.2% |
6 | 11 | 2.2% |
7 | 11 | 2.2% |
8 | 11 | 2.2% |
9 | 11 | 2.2% |
10 | 11 | 2.2% |
Other values (95) | 380 |
Value | Count | Frequency (%) |
1 | 11 | |
2 | 11 | |
3 | 11 | |
4 | 11 | |
5 | 11 |
Value | Count | Frequency (%) |
105 | 1 | |
104 | 1 | |
103 | 1 | |
102 | 1 | |
101 | 1 |
duration
Real number (ℝ≥0)
Distinct | 123 |
---|---|
Distinct (%) | 25.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 7.090204082 |
---|---|
Minimum | 1.2 |
Maximum | 29.8 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 1.2 |
---|---|
5-th percentile | 2.945 |
Q1 | 4.8 |
median | 6.7 |
Q3 | 8.9 |
95-th percentile | 12.8 |
Maximum | 29.8 |
Range | 28.6 |
Interquartile range (IQR) | 4.1 |
Descriptive statistics
Standard deviation | 3.172096189 |
---|---|
Coefficient of variation (CV) | 0.4473913801 |
Kurtosis | 4.921752254 |
Mean | 7.090204082 |
Median Absolute Deviation (MAD) | 2.1 |
Skewness | 1.249630479 |
Sum | 3474.2 |
Variance | 10.06219423 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
5 | 11 | 2.2% |
5.7 | 11 | 2.2% |
6.7 | 11 | 2.2% |
5.4 | 10 | 2.0% |
4.2 | 9 | 1.8% |
5.9 | 9 | 1.8% |
6 | 8 | 1.6% |
3.8 | 8 | 1.6% |
6.4 | 8 | 1.6% |
8 | 8 | 1.6% |
Other values (113) | 397 |
Value | Count | Frequency (%) |
1.2 | 1 | |
1.3 | 1 | |
1.5 | 1 | |
1.8 | 1 | |
2.1 | 2 |
Value | Count | Frequency (%) |
29.8 | 1 | 0.2% |
17.5 | 1 | 0.2% |
17 | 1 | 0.2% |
16.3 | 1 | 0.2% |
15.7 | 3 |
loudness
Real number (ℝ)
Distinct | 54 |
---|---|
Distinct (%) | 11.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -21.37142857 |
---|---|
Minimum | -28.2 |
Maximum | -17.8 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | -28.2 |
---|---|
5-th percentile | -22.8 |
Q1 | -21.9 |
median | -21.3 |
Q3 | -20.8 |
95-th percentile | -20 |
Maximum | -17.8 |
Range | 10.4 |
Interquartile range (IQR) | 1.1 |
Descriptive statistics
Standard deviation | 0.9322664074 |
---|---|
Coefficient of variation (CV) | -0.04362209126 |
Kurtosis | 6.062766555 |
Mean | -21.37142857 |
Median Absolute Deviation (MAD) | 0.6 |
Skewness | -0.7373414343 |
Sum | -10472 |
Variance | 0.8691206544 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
-20.7 | 26 | 5.3% |
-21.6 | 26 | 5.3% |
-21.8 | 25 | 5.1% |
-21.2 | 23 | 4.7% |
-21.4 | 23 | 4.7% |
-21 | 23 | 4.7% |
-20.6 | 23 | 4.7% |
-21.3 | 22 | 4.5% |
-20.9 | 22 | 4.5% |
-21.1 | 20 | 4.1% |
Other values (44) | 257 |
Value | Count | Frequency (%) |
-28.2 | 1 | |
-24.4 | 1 | |
-24.1 | 1 | |
-23.8 | 1 | |
-23.7 | 1 |
Value | Count | Frequency (%) |
-17.8 | 1 | |
-18.6 | 1 | |
-18.7 | 1 | |
-18.8 | 1 | |
-18.9 | 1 |
minSilenceDB
Real number (ℝ)
Distinct | 19 |
---|---|
Distinct (%) | 3.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -59.53061224 |
---|---|
Minimum | -69 |
Maximum | -39 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | -69 |
---|---|
5-th percentile | -65 |
Q1 | -62 |
median | -59 |
Q3 | -57 |
95-th percentile | -54.45 |
Maximum | -39 |
Range | 30 |
Interquartile range (IQR) | 5 |
Descriptive statistics
Standard deviation | 3.472368526 |
---|---|
Coefficient of variation (CV) | -0.05832912506 |
Kurtosis | 3.149184166 |
Mean | -59.53061224 |
Median Absolute Deviation (MAD) | 2 |
Skewness | 0.5688538134 |
Sum | -29170 |
Variance | 12.05734318 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=19)
Value | Count | Frequency (%) |
-59 | 58 | |
-58 | 56 | |
-60 | 48 | |
-57 | 46 | |
-61 | 46 | |
-62 | 43 | |
-56 | 40 | |
-63 | 37 | |
-64 | 29 | |
-55 | 27 | |
Other values (9) | 60 |
Value | Count | Frequency (%) |
-69 | 1 | 0.2% |
-67 | 7 | 1.4% |
-66 | 9 | 1.8% |
-65 | 18 | |
-64 | 29 |
Value | Count | Frequency (%) |
-39 | 1 | 0.2% |
-41 | 1 | 0.2% |
-49 | 1 | 0.2% |
-53 | 5 | 1.0% |
-54 | 17 |
Distinct | 1 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
22050 |
---|
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 5 |
Min length | 5 |
Characters and Unicode
Total characters | 2450 |
---|---|
Distinct characters | 3 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 22050 |
---|---|
2nd row | 22050 |
3rd row | 22050 |
4th row | 22050 |
5th row | 22050 |
Value | Count | Frequency (%) |
22050 | 490 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
22050 | 490 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 2450 |
Most frequent character per category
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 2450 |
Most frequent character per script
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2450 |
Most frequent character per block
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
silencePercent
Real number (ℝ≥0)
Distinct | 49 |
---|---|
Distinct (%) | 10.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 22.89183673 |
---|---|
Minimum | 8 |
Maximum | 62 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 8 |
---|---|
5-th percentile | 11.45 |
Q1 | 16 |
median | 20 |
Q3 | 27 |
95-th percentile | 44 |
Maximum | 62 |
Range | 54 |
Interquartile range (IQR) | 11 |
Descriptive statistics
Standard deviation | 10.23339768 |
---|---|
Coefficient of variation (CV) | 0.4470326169 |
Kurtosis | 1.402915108 |
Mean | 22.89183673 |
Median Absolute Deviation (MAD) | 5 |
Skewness | 1.276319318 |
Sum | 11217 |
Variance | 104.7224281 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=49)
Value | Count | Frequency (%) |
16 | 33 | 6.7% |
19 | 32 | 6.5% |
14 | 31 | 6.3% |
18 | 25 | 5.1% |
17 | 23 | 4.7% |
15 | 23 | 4.7% |
23 | 23 | 4.7% |
13 | 23 | 4.7% |
20 | 22 | 4.5% |
24 | 20 | 4.1% |
Other values (39) | 235 |
Value | Count | Frequency (%) |
8 | 4 | 0.8% |
9 | 2 | 0.4% |
10 | 7 | 1.4% |
11 | 12 | |
12 | 18 |
Value | Count | Frequency (%) |
62 | 1 | 0.2% |
58 | 1 | 0.2% |
57 | 1 | 0.2% |
56 | 3 | |
55 | 2 |
averageFrequency
Real number (ℝ≥0)
Distinct | 376 |
---|---|
Distinct (%) | 76.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2243.112245 |
---|---|
Minimum | 1576 |
Maximum | 3167 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 1576 |
---|---|
5-th percentile | 1850.2 |
Q1 | 2089.25 |
median | 2234.5 |
Q3 | 2393 |
95-th percentile | 2643.25 |
Maximum | 3167 |
Range | 1591 |
Interquartile range (IQR) | 303.75 |
Descriptive statistics
Standard deviation | 239.5230555 |
---|---|
Coefficient of variation (CV) | 0.1067815737 |
Kurtosis | 0.365312195 |
Mean | 2243.112245 |
Median Absolute Deviation (MAD) | 149 |
Skewness | 0.1706018303 |
Sum | 1099125 |
Variance | 57371.29412 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
2090 | 5 | 1.0% |
2142 | 4 | 0.8% |
2221 | 4 | 0.8% |
2368 | 3 | 0.6% |
2136 | 3 | 0.6% |
2134 | 3 | 0.6% |
2196 | 3 | 0.6% |
2214 | 3 | 0.6% |
2125 | 3 | 0.6% |
2233 | 3 | 0.6% |
Other values (366) | 456 |
Value | Count | Frequency (%) |
1576 | 1 | |
1579 | 1 | |
1648 | 1 | |
1659 | 1 | |
1668 | 1 |
Value | Count | Frequency (%) |
3167 | 1 | |
2971 | 1 | |
2930 | 1 | |
2926 | 1 | |
2843 | 1 |
Distinct | 1 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
clean |
---|
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 5 |
Min length | 5 |
Characters and Unicode
Total characters | 2450 |
---|---|
Distinct characters | 5 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | clean |
---|---|
2nd row | clean |
3rd row | clean |
4th row | clean |
5th row | clean |
Value | Count | Frequency (%) |
clean | 490 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
clean | 490 |
Most occurring characters
Value | Count | Frequency (%) |
c | 490 | |
l | 490 | |
e | 490 | |
a | 490 | |
n | 490 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 2450 |
Most frequent character per category
Value | Count | Frequency (%) |
c | 490 | |
l | 490 | |
e | 490 | |
a | 490 | |
n | 490 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 2450 |
Most frequent character per script
Value | Count | Frequency (%) |
c | 490 | |
l | 490 | |
e | 490 | |
a | 490 | |
n | 490 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2450 |
Most frequent character per block
Value | Count | Frequency (%) |
c | 490 | |
l | 490 | |
e | 490 | |
a | 490 | |
n | 490 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
id | duration | loudness | minSilenceDB | samplingrate | silencePercent | averageFrequency | speacker | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2.1 | -20.8 | -61 | 22050 | 42 | 3167 | clean |
1 | 2 | 6.2 | -20.9 | -61 | 22050 | 46 | 2351 | clean |
2 | 3 | 4.1 | -21.4 | -59 | 22050 | 23 | 2010 | clean |
3 | 4 | 5.0 | -18.9 | -65 | 22050 | 25 | 2134 | clean |
4 | 5 | 3.1 | -22.8 | -61 | 22050 | 44 | 2221 | clean |
5 | 6 | 5.1 | -20.8 | -62 | 22050 | 41 | 2261 | clean |
6 | 7 | 6.5 | -23.2 | -53 | 22050 | 21 | 2132 | clean |
7 | 8 | 6.1 | -22.5 | -58 | 22050 | 17 | 2690 | clean |
8 | 9 | 2.4 | -21.7 | -59 | 22050 | 40 | 2260 | clean |
9 | 10 | 12.6 | -21.1 | -58 | 22050 | 15 | 2110 | clean |
Last rows
id | duration | loudness | minSilenceDB | samplingrate | silencePercent | averageFrequency | speacker | |
---|---|---|---|---|---|---|---|---|
480 | 6 | 7.7 | -21.4 | -65 | 22050 | 11 | 1961 | clean |
481 | 7 | 2.2 | -22.3 | -55 | 22050 | 24 | 2134 | clean |
482 | 8 | 5.8 | -21.7 | -61 | 22050 | 18 | 2106 | clean |
483 | 9 | 17.0 | -21.8 | -60 | 22050 | 18 | 2142 | clean |
484 | 10 | 8.5 | -21.8 | -58 | 22050 | 16 | 2134 | clean |
485 | 11 | 9.5 | -22.3 | -56 | 22050 | 22 | 2044 | clean |
486 | 12 | 9.4 | -21.5 | -58 | 22050 | 35 | 2309 | clean |
487 | 13 | 4.5 | -21.6 | -61 | 22050 | 22 | 2049 | clean |
488 | 14 | 4.8 | -22.1 | -61 | 22050 | 28 | 2125 | clean |
489 | 15 | 7.4 | -22.8 | -59 | 22050 | 14 | 2005 | clean |