pandas_profiling

Dataset statistics

Number of variables	5
Number of observations	10
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	528.0 B
Average record size in memory	52.8 B

Variable types

NUM	3
CAT	2

Warnings

`姓名` is uniformly distributed	Uniform
`考试类型` is uniformly distributed	Uniform
`数学` has unique values	Unique

Reproduction

Analysis started	2020-10-16 05:51:13.870829
Analysis finished	2020-10-16 05:51:15.345534
Duration	1.47 second
Software version	pandas-profiling v2.9.0
Download configuration	config.yaml

姓名
Categorical

UNIFORM

Distinct	5
Distinct (%)	50.0%
Missing	0
Missing (%)	0.0%
Memory size	80.0 B

张香秀	2
麻寒	2
吕傲文	2
冯乐萱	2
廉凡	2

Frequencies
Length

Value	Count	Frequency (%)
张香秀	2	20.0%
麻寒	2	20.0%
吕傲文	2	20.0%
冯乐萱	2	20.0%
廉凡	2	20.0%

Frequencies of value counts

Unique

Unique	0 ?
Unique (%)	0.0%

Histogram of lengths of the category

Length

Max length	3
Median length	3
Mean length	2.6
Min length	2

语文
Real number (ℝ_≥0)

Distinct	9
Distinct (%)	90.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	80.7
Minimum	59
Maximum	97
Zeros	0
Zeros (%)	0.0%
Memory size	80.0 B

Quantile statistics

Minimum	59
5-th percentile	62.6
Q1	69.25
median	78
Q3	95.75
95-th percentile	96.55
Maximum	97
Range	38
Interquartile range (IQR)	26.5

Descriptive statistics

Standard deviation	14.2987956
Coefficient of variation (CV)	0.1771845799
Kurtosis	-1.681802207
Mean	80.7
Median Absolute Deviation (MAD)	14
Skewness	-0.04658850064
Sum	807
Variance	204.4555556
Monotocity	Not monotonic

Histogram with fixed size bins (bins=9)

Value	Count	Frequency (%)
96	2	20.0%
95	1	10.0%
76	1	10.0%
80	1	10.0%
59	1	10.0%
73	1	10.0%
68	1	10.0%
67	1	10.0%
97	1	10.0%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
59	1	10.0%
67	1	10.0%
68	1	10.0%
73	1	10.0%
76	1	10.0%

Value	Count	Frequency (%)
97	1	10.0%
96	2	20.0%
95	1	10.0%
80	1	10.0%
76	1	10.0%

数学
Real number (ℝ_≥0)

UNIQUE

Distinct	10
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	79.1
Minimum	60
Maximum	98
Zeros	0
Zeros (%)	0.0%
Memory size	80.0 B

Quantile statistics

Minimum	60
5-th percentile	60.45
Q1	65.75
median	83.5
Q3	88.75
95-th percentile	94.85
Maximum	98
Range	38
Interquartile range (IQR)	23

Descriptive statistics

Standard deviation	13.76347824
Coefficient of variation (CV)	0.1740009892
Kurtosis	-1.455830827
Mean	79.1
Median Absolute Deviation (MAD)	8.5
Skewness	-0.3599178593
Sum	791
Variance	189.4333333
Monotocity	Not monotonic

Histogram with fixed size bins (bins=10)

Value	Count	Frequency (%)
63	1	10.0%
61	1	10.0%
60	1	10.0%
91	1	10.0%
74	1	10.0%
89	1	10.0%
88	1	10.0%
86	1	10.0%
98	1	10.0%
81	1	10.0%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
60	1	10.0%
61	1	10.0%
63	1	10.0%
74	1	10.0%
81	1	10.0%

Value	Count	Frequency (%)
98	1	10.0%
91	1	10.0%
89	1	10.0%
88	1	10.0%
86	1	10.0%

英语
Real number (ℝ_≥0)

Distinct	8
Distinct (%)	80.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	83.5
Minimum	66
Maximum	100
Zeros	0
Zeros (%)	0.0%
Memory size	80.0 B

Quantile statistics

Minimum	66
5-th percentile	68.7
Q1	72.5
median	79
Q3	97.75
95-th percentile	100
Maximum	100
Range	34
Interquartile range (IQR)	25.25

Descriptive statistics

Standard deviation	13.45155918
Coefficient of variation (CV)	0.1610965172
Kurtosis	-1.937826699
Mean	83.5
Median Absolute Deviation (MAD)	10
Skewness	0.2278499479
Sum	835
Variance	180.9444444
Monotocity	Not monotonic

Histogram with fixed size bins (bins=8)

Value	Count	Frequency (%)
72	2	20.0%
100	2	20.0%
94	1	10.0%
99	1	10.0%
75	1	10.0%
74	1	10.0%
83	1	10.0%
66	1	10.0%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
66	1	10.0%
72	2	20.0%
74	1	10.0%
75	1	10.0%
83	1	10.0%

Value	Count	Frequency (%)
100	2	20.0%
99	1	10.0%
94	1	10.0%
83	1	10.0%
75	1	10.0%

考试类型
Categorical

UNIFORM

Distinct	2
Distinct (%)	20.0%
Missing	0
Missing (%)	0.0%
Memory size	80.0 B

期中	5
期末	5

Frequencies
Length

Value	Count	Frequency (%)
期中	5	50.0%
期末	5	50.0%

Frequencies of value counts

Unique

Unique	0 ?
Unique (%)	0.0%

Histogram of lengths of the category

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Count
Matrix

First rows

	姓名	语文	数学	英语	考试类型
0	吕傲文	96	74	100	期中
1	张香秀	96	91	83	期中
2	麻寒	76	61	75	期中
3	廉凡	68	86	100	期中
4	冯乐萱	80	60	72	期中
5	吕傲文	97	81	94	期末
6	张香秀	67	89	72	期末
7	麻寒	95	63	99	期末
8	廉凡	73	98	66	期末
9	冯乐萱	59	88	74	期末

Last rows

	姓名	语文	数学	英语	考试类型
0	吕傲文	96	74	100	期中
1	张香秀	96	91	83	期中
2	麻寒	76	61	75	期中
3	廉凡	68	86	100	期中
4	冯乐萱	80	60	72	期中
5	吕傲文	97	81	94	期末
6	张香秀	67	89	72	期末
7	麻寒	95	63	99	期末
8	廉凡	73	98	66	期末
9	冯乐萱	59	88	74	期末

Overview

Variables

Interactions

Correlations

Pearson's r

Spearman's ρ

Kendall's τ

Phik (φk)

Cramér's V (φc)

Missing values

Sample

First rows

Last rows