Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 980 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 61.4 KiB |
Average record size in memory | 64.1 B |
Variable types
Numeric | 6 |
---|---|
Categorical | 2 |
samplingrate has constant value "22050" | Constant |
speacker is highly correlated with samplingrate | High correlation |
samplingrate is highly correlated with speacker | High correlation |
speacker is uniformly distributed | Uniform |
silencePercent has 47 (4.8%) zeros | Zeros |
Reproduction
Analysis started | 2021-05-11 14:22:38.399523 |
---|---|
Analysis finished | 2021-05-11 14:22:44.221281 |
Duration | 5.82 seconds |
Software version | pandas-profiling v2.11.0 |
Download configuration | config.yaml |
id
Real number (ℝ≥0)
Distinct | 105 |
---|---|
Distinct (%) | 10.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 32.29183673 |
---|---|
Minimum | 1 |
Maximum | 105 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.8 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 3 |
Q1 | 12 |
median | 26 |
Q3 | 47 |
95-th percentile | 83 |
Maximum | 105 |
Range | 104 |
Interquartile range (IQR) | 35 |
Descriptive statistics
Standard deviation | 25.43574361 |
---|---|
Coefficient of variation (CV) | 0.7876833955 |
Kurtosis | -0.07529870321 |
Mean | 32.29183673 |
Median Absolute Deviation (MAD) | 16 |
Skewness | 0.8835091218 |
Sum | 31646 |
Variance | 646.9770528 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
1 | 22 | 2.2% |
2 | 22 | 2.2% |
3 | 22 | 2.2% |
4 | 22 | 2.2% |
5 | 22 | 2.2% |
6 | 22 | 2.2% |
7 | 22 | 2.2% |
8 | 22 | 2.2% |
9 | 22 | 2.2% |
10 | 22 | 2.2% |
Other values (95) | 760 |
Value | Count | Frequency (%) |
1 | 22 | |
2 | 22 | |
3 | 22 | |
4 | 22 | |
5 | 22 |
Value | Count | Frequency (%) |
105 | 2 | |
104 | 2 | |
103 | 2 | |
102 | 2 | |
101 | 2 |
duration
Real number (ℝ≥0)
Distinct | 172 |
---|---|
Distinct (%) | 17.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 7.651428571 |
---|---|
Minimum | 0.9 |
Maximum | 30 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.8 KiB |
Quantile statistics
Minimum | 0.9 |
---|---|
5-th percentile | 2.9 |
Q1 | 4.9 |
median | 6.9 |
Q3 | 9.4 |
95-th percentile | 15.2 |
Maximum | 30 |
Range | 29.1 |
Interquartile range (IQR) | 4.5 |
Descriptive statistics
Standard deviation | 4.016862821 |
---|---|
Coefficient of variation (CV) | 0.5249820715 |
Kurtosis | 4.597320147 |
Mean | 7.651428571 |
Median Absolute Deviation (MAD) | 2.2 |
Skewness | 1.633594192 |
Sum | 7498.4 |
Variance | 16.13518693 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
5.4 | 19 | 1.9% |
5.7 | 17 | 1.7% |
6.7 | 17 | 1.7% |
3.8 | 16 | 1.6% |
6.5 | 15 | 1.5% |
8.8 | 15 | 1.5% |
5 | 15 | 1.5% |
4.9 | 15 | 1.5% |
6.4 | 14 | 1.4% |
7.2 | 14 | 1.4% |
Other values (162) | 823 |
Value | Count | Frequency (%) |
0.9 | 1 | 0.1% |
1 | 1 | 0.1% |
1.2 | 1 | 0.1% |
1.3 | 3 | |
1.4 | 1 | 0.1% |
Value | Count | Frequency (%) |
30 | 1 | |
29.9 | 1 | |
29.8 | 1 | |
26.5 | 1 | |
26.2 | 1 |
loudness
Real number (ℝ)
Distinct | 59 |
---|---|
Distinct (%) | 6.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -21.28469388 |
---|---|
Minimum | -28.2 |
Maximum | -17.8 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.8 KiB |
Quantile statistics
Minimum | -28.2 |
---|---|
5-th percentile | -22.7 |
Q1 | -21.8 |
median | -21.2 |
Q3 | -20.7 |
95-th percentile | -19.9 |
Maximum | -17.8 |
Range | 10.4 |
Interquartile range (IQR) | 1.1 |
Descriptive statistics
Standard deviation | 0.9062096573 |
---|---|
Coefficient of variation (CV) | -0.04257564908 |
Kurtosis | 5.111762064 |
Mean | -21.28469388 |
Median Absolute Deviation (MAD) | 0.5 |
Skewness | -0.8106417881 |
Sum | -20859 |
Variance | 0.821215943 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
-21 | 53 | 5.4% |
-21.3 | 50 | 5.1% |
-21.8 | 49 | 5.0% |
-20.9 | 47 | 4.8% |
-21.6 | 45 | 4.6% |
-20.6 | 45 | 4.6% |
-21.1 | 45 | 4.6% |
-21.2 | 44 | 4.5% |
-21.7 | 43 | 4.4% |
-20.5 | 42 | 4.3% |
Other values (49) | 517 |
Value | Count | Frequency (%) |
-28.2 | 1 | |
-26.3 | 1 | |
-26 | 1 | |
-24.4 | 1 | |
-24.1 | 2 |
Value | Count | Frequency (%) |
-17.8 | 1 | |
-18.5 | 1 | |
-18.6 | 1 | |
-18.7 | 1 | |
-18.8 | 1 |
minSilenceDB
Real number (ℝ)
Distinct | 54 |
---|---|
Distinct (%) | 5.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -52.39285714 |
---|---|
Minimum | -69 |
Maximum | -10 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.8 KiB |
Quantile statistics
Minimum | -69 |
---|---|
5-th percentile | -64 |
Q1 | -60 |
median | -57 |
Q3 | -44 |
95-th percentile | -24 |
Maximum | -10 |
Range | 59 |
Interquartile range (IQR) | 16 |
Descriptive statistics
Standard deviation | 11.96691686 |
---|---|
Coefficient of variation (CV) | -0.2284074111 |
Kurtosis | 1.112553649 |
Mean | -52.39285714 |
Median Absolute Deviation (MAD) | 4 |
Skewness | 1.372621475 |
Sum | -51345 |
Variance | 143.2070991 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
-59 | 83 | 8.5% |
-58 | 82 | 8.4% |
-57 | 74 | 7.6% |
-56 | 66 | 6.7% |
-60 | 60 | 6.1% |
-61 | 59 | 6.0% |
-55 | 56 | 5.7% |
-62 | 56 | 5.7% |
-63 | 41 | 4.2% |
-54 | 36 | 3.7% |
Other values (44) | 367 |
Value | Count | Frequency (%) |
-69 | 1 | 0.1% |
-68 | 1 | 0.1% |
-67 | 8 | 0.8% |
-66 | 10 | |
-65 | 21 |
Value | Count | Frequency (%) |
-10 | 2 | 0.2% |
-13 | 2 | 0.2% |
-14 | 2 | 0.2% |
-17 | 5 | |
-18 | 3 |
Distinct | 1 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 7.8 KiB |
22050 |
---|
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 5 |
Min length | 5 |
Characters and Unicode
Total characters | 4900 |
---|---|
Distinct characters | 3 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 22050 |
---|---|
2nd row | 22050 |
3rd row | 22050 |
4th row | 22050 |
5th row | 22050 |
Value | Count | Frequency (%) |
22050 | 980 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
22050 | 980 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 1960 | |
0 | 1960 | |
5 | 980 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 4900 |
Most frequent character per category
Value | Count | Frequency (%) |
2 | 1960 | |
0 | 1960 | |
5 | 980 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 4900 |
Most frequent character per script
Value | Count | Frequency (%) |
2 | 1960 | |
0 | 1960 | |
5 | 980 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 4900 |
Most frequent character per block
Value | Count | Frequency (%) |
2 | 1960 | |
0 | 1960 | |
5 | 980 |
Distinct | 63 |
---|---|
Distinct (%) | 6.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 19.07857143 |
---|---|
Minimum | 0 |
Maximum | 63 |
Zeros | 47 |
Zeros (%) | 4.8% |
Memory size | 7.8 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 7 |
median | 17 |
Q3 | 26 |
95-th percentile | 49 |
Maximum | 63 |
Range | 63 |
Interquartile range (IQR) | 19 |
Descriptive statistics
Standard deviation | 14.38053353 |
---|---|
Coefficient of variation (CV) | 0.7537531615 |
Kurtosis | 0.1894858692 |
Mean | 19.07857143 |
Median Absolute Deviation (MAD) | 9 |
Skewness | 0.7961537505 |
Sum | 18697 |
Variance | 206.7997446 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 47 | 4.8% |
2 | 45 | 4.6% |
3 | 43 | 4.4% |
16 | 43 | 4.4% |
14 | 40 | 4.1% |
1 | 36 | 3.7% |
19 | 36 | 3.7% |
4 | 35 | 3.6% |
15 | 33 | 3.4% |
18 | 31 | 3.2% |
Other values (53) | 591 |
Value | Count | Frequency (%) |
0 | 47 | |
1 | 36 | |
2 | 45 | |
3 | 43 | |
4 | 35 |
Value | Count | Frequency (%) |
63 | 1 | 0.1% |
62 | 3 | |
61 | 1 | 0.1% |
59 | 1 | 0.1% |
58 | 5 |
averageFrequency
Real number (ℝ≥0)
Distinct | 604 |
---|---|
Distinct (%) | 61.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2270.718367 |
---|---|
Minimum | 1073 |
Maximum | 3167 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 7.8 KiB |
Quantile statistics
Minimum | 1073 |
---|---|
5-th percentile | 1837.9 |
Q1 | 2112 |
median | 2265 |
Q3 | 2428.25 |
95-th percentile | 2690.05 |
Maximum | 3167 |
Range | 2094 |
Interquartile range (IQR) | 316.25 |
Descriptive statistics
Standard deviation | 253.5593772 |
---|---|
Coefficient of variation (CV) | 0.1116648286 |
Kurtosis | 0.7945148303 |
Mean | 2270.718367 |
Median Absolute Deviation (MAD) | 158 |
Skewness | -0.03430560005 |
Sum | 2225304 |
Variance | 64292.35778 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
2128 | 8 | 0.8% |
2221 | 7 | 0.7% |
2214 | 5 | 0.5% |
2223 | 5 | 0.5% |
2090 | 5 | 0.5% |
2100 | 5 | 0.5% |
2138 | 5 | 0.5% |
2047 | 5 | 0.5% |
2369 | 5 | 0.5% |
2134 | 5 | 0.5% |
Other values (594) | 925 |
Value | Count | Frequency (%) |
1073 | 1 | |
1378 | 1 | |
1576 | 1 | |
1579 | 1 | |
1597 | 1 |
Value | Count | Frequency (%) |
3167 | 1 | |
3113 | 1 | |
3097 | 1 | |
2998 | 1 | |
2973 | 1 |
Distinct | 2 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 7.8 KiB |
clean | |
---|---|
full |
Length
Max length | 5 |
---|---|
Median length | 4.5 |
Mean length | 4.5 |
Min length | 4 |
Characters and Unicode
Total characters | 4410 |
---|---|
Distinct characters | 7 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | clean |
---|---|
2nd row | clean |
3rd row | clean |
4th row | clean |
5th row | clean |
Value | Count | Frequency (%) |
clean | 490 | |
full | 490 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
clean | 490 | |
full | 490 |
Most occurring characters
Value | Count | Frequency (%) |
l | 1470 | |
c | 490 | 11.1% |
e | 490 | 11.1% |
a | 490 | 11.1% |
n | 490 | 11.1% |
f | 490 | 11.1% |
u | 490 | 11.1% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 4410 |
Most frequent character per category
Value | Count | Frequency (%) |
l | 1470 | |
c | 490 | 11.1% |
e | 490 | 11.1% |
a | 490 | 11.1% |
n | 490 | 11.1% |
f | 490 | 11.1% |
u | 490 | 11.1% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 4410 |
Most frequent character per script
Value | Count | Frequency (%) |
l | 1470 | |
c | 490 | 11.1% |
e | 490 | 11.1% |
a | 490 | 11.1% |
n | 490 | 11.1% |
f | 490 | 11.1% |
u | 490 | 11.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 4410 |
Most frequent character per block
Value | Count | Frequency (%) |
l | 1470 | |
c | 490 | 11.1% |
e | 490 | 11.1% |
a | 490 | 11.1% |
n | 490 | 11.1% |
f | 490 | 11.1% |
u | 490 | 11.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
id | duration | loudness | minSilenceDB | samplingrate | silencePercent | averageFrequency | speacker | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2.1 | -20.8 | -61 | 22050 | 42 | 3167 | clean |
1 | 2 | 6.2 | -20.9 | -61 | 22050 | 46 | 2351 | clean |
2 | 3 | 4.1 | -21.4 | -59 | 22050 | 23 | 2010 | clean |
3 | 4 | 5.0 | -18.9 | -65 | 22050 | 25 | 2134 | clean |
4 | 5 | 3.1 | -22.8 | -61 | 22050 | 44 | 2221 | clean |
5 | 6 | 5.1 | -20.8 | -62 | 22050 | 41 | 2261 | clean |
6 | 7 | 6.5 | -23.2 | -53 | 22050 | 21 | 2132 | clean |
7 | 8 | 6.1 | -22.5 | -58 | 22050 | 17 | 2690 | clean |
8 | 9 | 2.4 | -21.7 | -59 | 22050 | 40 | 2260 | clean |
9 | 10 | 12.6 | -21.1 | -58 | 22050 | 15 | 2110 | clean |
Last rows
id | duration | loudness | minSilenceDB | samplingrate | silencePercent | averageFrequency | speacker | |
---|---|---|---|---|---|---|---|---|
970 | 6 | 8.9 | -21.6 | -62 | 22050 | 22 | 2044 | full |
971 | 7 | 2.7 | -20.4 | -37 | 22050 | 7 | 2213 | full |
972 | 8 | 11.8 | -21.0 | -58 | 22050 | 57 | 2294 | full |
973 | 9 | 14.2 | -20.3 | -58 | 22050 | 16 | 2007 | full |
974 | 10 | 8.9 | -20.7 | -64 | 22050 | 19 | 2182 | full |
975 | 11 | 9.0 | -22.0 | -62 | 22050 | 11 | 2241 | full |
976 | 12 | 8.0 | -24.1 | -59 | 22050 | 25 | 2428 | full |
977 | 13 | 9.3 | -21.2 | -56 | 22050 | 33 | 2308 | full |
978 | 14 | 4.9 | -21.6 | -44 | 22050 | 6 | 2173 | full |
979 | 15 | 7.7 | -20.6 | -58 | 22050 | 13 | 2219 | full |