32 Descriptive statistics
32.1 Descriptive statistics
Quantitatively describe or summarize variables
stat.desc
frompastecs
librarybase
includes countsdesc
includes descriptive statsnorm
(default isFALSE
) includes distribution stats
32.2 stat.desc output
dep_delay | arr_delay | distance | |
---|---|---|---|
nbr.val | 1668.0000000 | 1667.000000 | 1.699000e+03 |
nbr.null | 58.0000000 | 35.000000 | 0.000000e+00 |
nbr.na | 31.0000000 | 32.000000 | 0.000000e+00 |
min | -17.0000000 | -63.000000 | 9.600000e+01 |
max | 193.0000000 | 191.000000 | 2.153000e+03 |
range | 210.0000000 | 254.000000 | 2.057000e+03 |
sum | 961.0000000 | -4450.000000 | 9.715580e+05 |
median | -4.0000000 | -7.000000 | 5.290000e+02 |
mean | 0.5761391 | -2.669466 | 5.718411e+02 |
SE.mean | 0.4084206 | 0.518816 | 1.464965e+01 |
CI.mean.0.95 | 0.8010713 | 1.017600 | 2.873327e+01 |
var | 278.2347513 | 448.706408 | 3.646264e+05 |
std.dev | 16.6803702 | 21.182691 | 6.038430e+02 |
coef.var | 28.9519850 | -7.935179 | 1.055963e+00 |
32.3 stat.desc: basic
nbr.val
: overall number of values in the datasetnbr.null
: number ofNULL
values – NULL is often returned by expressions and functions whose values are undefinednbr.na
: number ofNA
s – missing value indicator
32.4 stat.desc: desc
min
(alsomin()
): minimum value in the datasetmax
(alsomax()
): minimum value in the datasetrange
: difference betweenmin
andmax
(different fromrange()
)sum
(alsosum()
): sum of the values in the datasetmean
(alsomean()
): arithmetic mean, that issum
over the number of values notNA
median
(alsomedian()
): median, that is the value separating the higher half from the lower half the valuesmode()
functio is available: mode, the value that appears most often in the values
32.5 Sample statistics
Assuming that the data in the dataset are a sample of a population
SE.mean
: standard error of the mean – estimation of the variability of the mean calculated on different samples of the data (see also central limit theorem)CI.mean.0.95
: 95% confidence interval of the mean – indicates that there is a 95% probability that the actual mean is within that distance from the sample mean
32.6 Estimating variation
var
: variance (\(\sigma^2\)), it quantifies the amount of variation as the average of squared distances from the mean
\[\sigma^2 = \frac{1}{n} \sum_{i=1}^n (\mu-x_i)^2\]
std.dev
: standard deviation (\(\sigma\)), it quantifies the amount of variation as the square root of the variance
\[\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^n (\mu-x_i)^2}\]
coef.var
: variation coefficient it quantifies the amount of variation as the standard deviation divided by the mean