Measures of Spread

Getting Started

First, be sure you have installed mosaic. Remember, you only need to install each package once on your machine.

Then, be sure to load the packages mosaic. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.

Data for Examples Show

As a reminder (see Overview of Summary Statistics), we will be using the penguins data from the palmerpenguins package:

library(palmerpenguins)

Here is a snippet of the data:

Palmer Penguins
species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
Gentoo	Biscoe	43.3	13.4	209	4400	female	2007
Chinstrap	Dream	50.8	18.5	201	4450	male	2009
Adelie	Dream	43.2	18.5	192	4100	male	2008
Adelie	Biscoe	41.0	20.0	203	4725	male	2009
Adelie	Dream	33.1	16.1	178	2900	female	2008

Variance

Definition: VARIANCE

The average of the distances from each data point in the sample to the sample mean \(\bar{x}\), squared

\[s^2=\frac{\sum_{i=1}^{n}\left(\ x_i-\bar{x}\right)^2}{n-1}\]

The variance is one of the most important metrics in statistics. It is the measure of spread around the mean.

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a variance in R using the var() function from the mosaic package.

var(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above

var(~x, data = mydata, na.rm = TRUE)

When we want to calculate the variance of a quantitative variable (y) measured across the values/groups of a categorical variable (x) it is a simple modification to the single variance code.

var(y ~ x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm and species from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm) and/or another categorical variable (e.g., sex).

Remember to add na.rm = TRUE if you get a warning.

Standard Deviation

Definition: STANDARD DEVIATION

The standard deviation is the square root of the variance. It also measures the spread about the mean, always zero or greater than zero, and is used more often than the variance as it has the same units of measurement as the original observations.

\[s= \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n}\left(\ x_i-\bar{x}\right)^2}{n-1}}\]

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a standard deviation in R using the sd() function from the mosaic package.

sd(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the median, na.rm = TRUE. Add the argument to the code above

sd(~x, data = mydata, na.rm = TRUE)

Multiple Standard Deviations Across Groups

When we want to calculate the standard deviation of a quantitative variable (y) measured across the values/groups of a categorical variable (x) it is a simple modification to the single standard deviation code.

sd(y ~ x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm and species from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm) and/or another categorical variable (e.g., sex).

Remember to add na.rm = TRUE if you get a warning.

Getting Started

Data for Examples Show

Variance

Basic Code

Standard Deviation

Basic Code

Multiple Standard Deviations Across Groups

Quartiles/Percentiles

Interquartile Range (IQR)

Range