Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 490 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 30.8 KiB |
Average record size in memory | 64.3 B |
Variable types
Numeric | 6 |
---|---|
Categorical | 2 |
samplingrate has constant value "22050" | Constant |
speacker has constant value "full" | Constant |
speacker is highly correlated with samplingrate | High correlation |
samplingrate is highly correlated with speacker | High correlation |
silencePercent has 47 (9.6%) zeros | Zeros |
Reproduction
Analysis started | 2021-05-11 14:22:24.825059 |
---|---|
Analysis finished | 2021-05-11 14:22:30.677655 |
Duration | 5.85 seconds |
Software version | pandas-profiling v2.11.0 |
Download configuration | config.yaml |
id
Real number (ℝ≥0)
Distinct | 105 |
---|---|
Distinct (%) | 21.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 32.29183673 |
---|---|
Minimum | 1 |
Maximum | 105 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 3 |
Q1 | 12 |
median | 26 |
Q3 | 47 |
95-th percentile | 83 |
Maximum | 105 |
Range | 104 |
Interquartile range (IQR) | 35 |
Descriptive statistics
Standard deviation | 25.44874424 |
---|---|
Coefficient of variation (CV) | 0.7880859937 |
Kurtosis | -0.06951867365 |
Mean | 32.29183673 |
Median Absolute Deviation (MAD) | 16 |
Skewness | 0.8848673208 |
Sum | 15823 |
Variance | 647.6385835 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
1 | 11 | 2.2% |
2 | 11 | 2.2% |
3 | 11 | 2.2% |
4 | 11 | 2.2% |
5 | 11 | 2.2% |
6 | 11 | 2.2% |
7 | 11 | 2.2% |
8 | 11 | 2.2% |
9 | 11 | 2.2% |
10 | 11 | 2.2% |
Other values (95) | 380 |
Value | Count | Frequency (%) |
1 | 11 | |
2 | 11 | |
3 | 11 | |
4 | 11 | |
5 | 11 |
Value | Count | Frequency (%) |
105 | 1 | |
104 | 1 | |
103 | 1 | |
102 | 1 | |
101 | 1 |
duration
Real number (ℝ≥0)
Distinct | 155 |
---|---|
Distinct (%) | 31.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 8.212653061 |
---|---|
Minimum | 0.9 |
Maximum | 30 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 0.9 |
---|---|
5-th percentile | 2.945 |
Q1 | 4.9 |
median | 7.2 |
Q3 | 9.8 |
95-th percentile | 17.455 |
Maximum | 30 |
Range | 29.1 |
Interquartile range (IQR) | 4.9 |
Descriptive statistics
Standard deviation | 4.648649462 |
---|---|
Coefficient of variation (CV) | 0.5660350471 |
Kurtosis | 3.12376341 |
Mean | 8.212653061 |
Median Absolute Deviation (MAD) | 2.4 |
Skewness | 1.523021986 |
Sum | 4024.2 |
Variance | 21.60994182 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
4.9 | 11 | 2.2% |
6.5 | 11 | 2.2% |
8.8 | 10 | 2.0% |
7.2 | 9 | 1.8% |
3.5 | 9 | 1.8% |
5.4 | 9 | 1.8% |
6.8 | 8 | 1.6% |
4.4 | 8 | 1.6% |
3.8 | 8 | 1.6% |
5.3 | 7 | 1.4% |
Other values (145) | 400 |
Value | Count | Frequency (%) |
0.9 | 1 | |
1 | 1 | |
1.3 | 2 | |
1.4 | 1 | |
1.9 | 2 |
Value | Count | Frequency (%) |
30 | 1 | |
29.9 | 1 | |
26.5 | 1 | |
26.2 | 1 | |
24.8 | 2 |
loudness
Real number (ℝ)
Distinct | 48 |
---|---|
Distinct (%) | 9.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -21.19795918 |
---|---|
Minimum | -26.3 |
Maximum | -18.5 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | -26.3 |
---|---|
5-th percentile | -22.5 |
Q1 | -21.7 |
median | -21.1 |
Q3 | -20.6 |
95-th percentile | -19.9 |
Maximum | -18.5 |
Range | 7.8 |
Interquartile range (IQR) | 1.1 |
Descriptive statistics
Standard deviation | 0.8717304784 |
---|---|
Coefficient of variation (CV) | -0.04112332092 |
Kurtosis | 4.069988316 |
Mean | -21.19795918 |
Median Absolute Deviation (MAD) | 0.5 |
Skewness | -0.8820313449 |
Sum | -10387 |
Variance | 0.759914027 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=48)
Value | Count | Frequency (%) |
-21 | 30 | 6.1% |
-21.3 | 28 | 5.7% |
-21.1 | 25 | 5.1% |
-20.5 | 25 | 5.1% |
-20.9 | 25 | 5.1% |
-21.8 | 24 | 4.9% |
-20.8 | 24 | 4.9% |
-21.7 | 23 | 4.7% |
-21.5 | 22 | 4.5% |
-20.6 | 22 | 4.5% |
Other values (38) | 242 |
Value | Count | Frequency (%) |
-26.3 | 1 | |
-26 | 1 | |
-24.1 | 1 | |
-23.9 | 1 | |
-23.8 | 1 |
Value | Count | Frequency (%) |
-18.5 | 1 | |
-18.9 | 1 | |
-19.1 | 2 | |
-19.3 | 1 | |
-19.4 | 2 |
minSilenceDB
Real number (ℝ)
Distinct | 53 |
---|---|
Distinct (%) | 10.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -45.25510204 |
---|---|
Minimum | -68 |
Maximum | -10 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | -68 |
---|---|
5-th percentile | -62 |
Q1 | -57 |
median | -44.5 |
Q3 | -38.25 |
95-th percentile | -21 |
Maximum | -10 |
Range | 58 |
Interquartile range (IQR) | 18.75 |
Descriptive statistics
Standard deviation | 13.13568581 |
---|---|
Coefficient of variation (CV) | -0.2902586719 |
Kurtosis | -0.6053531101 |
Mean | -45.25510204 |
Median Absolute Deviation (MAD) | 11.5 |
Skewness | 0.5491350529 |
Sum | -22175 |
Variance | 172.5462418 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
-55 | 29 | 5.9% |
-57 | 28 | 5.7% |
-56 | 26 | 5.3% |
-58 | 26 | 5.3% |
-42 | 26 | 5.3% |
-59 | 25 | 5.1% |
-44 | 22 | 4.5% |
-43 | 21 | 4.3% |
-54 | 19 | 3.9% |
-39 | 19 | 3.9% |
Other values (43) | 249 |
Value | Count | Frequency (%) |
-68 | 1 | 0.2% |
-67 | 1 | 0.2% |
-66 | 1 | 0.2% |
-65 | 3 | |
-64 | 5 |
Value | Count | Frequency (%) |
-10 | 2 | 0.4% |
-13 | 2 | 0.4% |
-14 | 2 | 0.4% |
-17 | 5 | |
-18 | 3 |
Distinct | 1 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
22050 |
---|
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 5 |
Min length | 5 |
Characters and Unicode
Total characters | 2450 |
---|---|
Distinct characters | 3 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 22050 |
---|---|
2nd row | 22050 |
3rd row | 22050 |
4th row | 22050 |
5th row | 22050 |
Value | Count | Frequency (%) |
22050 | 490 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
22050 | 490 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 2450 |
Most frequent character per category
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 2450 |
Most frequent character per script
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2450 |
Most frequent character per block
Value | Count | Frequency (%) |
2 | 980 | |
0 | 980 | |
5 | 490 |
Distinct | 61 |
---|---|
Distinct (%) | 12.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 15.26530612 |
---|---|
Minimum | 0 |
Maximum | 63 |
Zeros | 47 |
Zeros (%) | 9.6% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 2 |
median | 7 |
Q3 | 25 |
95-th percentile | 51 |
Maximum | 63 |
Range | 63 |
Interquartile range (IQR) | 23 |
Descriptive statistics
Standard deviation | 16.73793618 |
---|---|
Coefficient of variation (CV) | 1.096469081 |
Kurtosis | 0.2254685567 |
Mean | 15.26530612 |
Median Absolute Deviation (MAD) | 6 |
Skewness | 1.150986741 |
Sum | 7480 |
Variance | 280.1585076 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 47 | 9.6% |
2 | 45 | 9.2% |
3 | 43 | 8.8% |
1 | 36 | 7.3% |
4 | 35 | 7.1% |
5 | 20 | 4.1% |
6 | 14 | 2.9% |
11 | 11 | 2.2% |
12 | 11 | 2.2% |
16 | 10 | 2.0% |
Other values (51) | 218 |
Value | Count | Frequency (%) |
0 | 47 | |
1 | 36 | |
2 | 45 | |
3 | 43 | |
4 | 35 |
Value | Count | Frequency (%) |
63 | 1 | 0.2% |
62 | 2 | |
61 | 1 | 0.2% |
59 | 1 | 0.2% |
58 | 4 |
averageFrequency
Real number (ℝ≥0)
Distinct | 380 |
---|---|
Distinct (%) | 77.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2298.32449 |
---|---|
Minimum | 1073 |
Maximum | 3113 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 1073 |
---|---|
5-th percentile | 1834.15 |
Q1 | 2143.25 |
median | 2293 |
Q3 | 2458.75 |
95-th percentile | 2727.85 |
Maximum | 3113 |
Range | 2040 |
Interquartile range (IQR) | 315.5 |
Descriptive statistics
Standard deviation | 264.2301756 |
---|---|
Coefficient of variation (CV) | 0.1149664361 |
Kurtosis | 1.229345818 |
Mean | 2298.32449 |
Median Absolute Deviation (MAD) | 159 |
Skewness | -0.2440008458 |
Sum | 1126179 |
Variance | 69817.5857 |
Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
2128 | 6 | 1.2% |
2565 | 4 | 0.8% |
2394 | 4 | 0.8% |
2369 | 4 | 0.8% |
2263 | 3 | 0.6% |
2100 | 3 | 0.6% |
2223 | 3 | 0.6% |
2361 | 3 | 0.6% |
2221 | 3 | 0.6% |
2284 | 3 | 0.6% |
Other values (370) | 454 |
Value | Count | Frequency (%) |
1073 | 1 | |
1378 | 1 | |
1597 | 1 | |
1601 | 1 | |
1615 | 1 |
Value | Count | Frequency (%) |
3113 | 1 | |
3097 | 1 | |
2998 | 1 | |
2973 | 1 | |
2942 | 1 |
Distinct | 1 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
full |
---|
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Characters and Unicode
Total characters | 1960 |
---|---|
Distinct characters | 3 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | full |
---|---|
2nd row | full |
3rd row | full |
4th row | full |
5th row | full |
Value | Count | Frequency (%) |
full | 490 |
Histogram of lengths of the category
Value | Count | Frequency (%) |
full | 490 |
Most occurring characters
Value | Count | Frequency (%) |
l | 980 | |
f | 490 | |
u | 490 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1960 |
Most frequent character per category
Value | Count | Frequency (%) |
l | 980 | |
f | 490 | |
u | 490 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1960 |
Most frequent character per script
Value | Count | Frequency (%) |
l | 980 | |
f | 490 | |
u | 490 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1960 |
Most frequent character per block
Value | Count | Frequency (%) |
l | 980 | |
f | 490 | |
u | 490 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
id | duration | loudness | minSilenceDB | samplingrate | silencePercent | averageFrequency | speacker | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2.6 | -26.0 | -29 | 22050 | 20 | 2271 | full |
1 | 2 | 5.5 | -21.9 | -61 | 22050 | 43 | 2612 | full |
2 | 3 | 5.1 | -21.6 | -42 | 22050 | 6 | 2014 | full |
3 | 4 | 9.3 | -19.5 | -58 | 22050 | 58 | 2221 | full |
4 | 5 | 4.6 | -22.3 | -54 | 22050 | 63 | 2495 | full |
5 | 6 | 4.8 | -20.4 | -57 | 22050 | 34 | 2436 | full |
6 | 7 | 7.6 | -21.8 | -24 | 22050 | 1 | 1904 | full |
7 | 8 | 6.5 | -19.5 | -42 | 22050 | 3 | 2643 | full |
8 | 9 | 3.0 | -22.9 | -44 | 22050 | 5 | 1946 | full |
9 | 10 | 13.2 | -21.4 | -61 | 22050 | 15 | 2229 | full |
Last rows
id | duration | loudness | minSilenceDB | samplingrate | silencePercent | averageFrequency | speacker | |
---|---|---|---|---|---|---|---|---|
480 | 6 | 8.9 | -21.6 | -62 | 22050 | 22 | 2044 | full |
481 | 7 | 2.7 | -20.4 | -37 | 22050 | 7 | 2213 | full |
482 | 8 | 11.8 | -21.0 | -58 | 22050 | 57 | 2294 | full |
483 | 9 | 14.2 | -20.3 | -58 | 22050 | 16 | 2007 | full |
484 | 10 | 8.9 | -20.7 | -64 | 22050 | 19 | 2182 | full |
485 | 11 | 9.0 | -22.0 | -62 | 22050 | 11 | 2241 | full |
486 | 12 | 8.0 | -24.1 | -59 | 22050 | 25 | 2428 | full |
487 | 13 | 9.3 | -21.2 | -56 | 22050 | 33 | 2308 | full |
488 | 14 | 4.9 | -21.6 | -44 | 22050 | 6 | 2173 | full |
489 | 15 | 7.7 | -20.6 | -58 | 22050 | 13 | 2219 | full |