32 Descriptive statistics
32.1 Descriptive statistics
Quantitatively describe or summarize variables
- stat.descfrom- pastecslibrary- baseincludes counts
- descincludes descriptive stats
- norm(default is- FALSE) includes distribution stats
 
32.2 stat.desc output
| dep_delay | arr_delay | distance | |
|---|---|---|---|
| nbr.val | 1668.0000000 | 1667.000000 | 1.699000e+03 | 
| nbr.null | 58.0000000 | 35.000000 | 0.000000e+00 | 
| nbr.na | 31.0000000 | 32.000000 | 0.000000e+00 | 
| min | -17.0000000 | -63.000000 | 9.600000e+01 | 
| max | 193.0000000 | 191.000000 | 2.153000e+03 | 
| range | 210.0000000 | 254.000000 | 2.057000e+03 | 
| sum | 961.0000000 | -4450.000000 | 9.715580e+05 | 
| median | -4.0000000 | -7.000000 | 5.290000e+02 | 
| mean | 0.5761391 | -2.669466 | 5.718411e+02 | 
| SE.mean | 0.4084206 | 0.518816 | 1.464965e+01 | 
| CI.mean.0.95 | 0.8010713 | 1.017600 | 2.873327e+01 | 
| var | 278.2347513 | 448.706408 | 3.646264e+05 | 
| std.dev | 16.6803702 | 21.182691 | 6.038430e+02 | 
| coef.var | 28.9519850 | -7.935179 | 1.055963e+00 | 
32.3 stat.desc: basic
- nbr.val: overall number of values in the dataset
- nbr.null: number of- NULLvalues – NULL is often returned by expressions and functions whose values are undefined
- nbr.na: number of- NAs – missing value indicator
32.4 stat.desc: desc
- min(also- min()): minimum value in the dataset
- max(also- max()): minimum value in the dataset
- range: difference between- minand- max(different from- range())
- sum(also- sum()): sum of the values in the dataset
- mean(also- mean()): arithmetic mean, that is- sumover the number of values not- NA
- median(also- median()): median, that is the value separating the higher half from the lower half the values
- mode()functio is available: mode, the value that appears most often in the values
32.5 Sample statistics
Assuming that the data in the dataset are a sample of a population
- SE.mean: standard error of the mean – estimation of the variability of the mean calculated on different samples of the data (see also central limit theorem)
- CI.mean.0.95: 95% confidence interval of the mean – indicates that there is a 95% probability that the actual mean is within that distance from the sample mean
32.6 Estimating variation
- var: variance (\(\sigma^2\)), it quantifies the amount of variation as the average of squared distances from the mean
\[\sigma^2 = \frac{1}{n} \sum_{i=1}^n (\mu-x_i)^2\]
- std.dev: standard deviation (\(\sigma\)), it quantifies the amount of variation as the square root of the variance
\[\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^n (\mu-x_i)^2}\]
- coef.var: variation coefficient it quantifies the amount of variation as the standard deviation divided by the mean