Measures of Center

Getting Started

First, be sure you have installed mosaic. Remember, you only need to install each package once on your machine.

Then, be sure to load the packages mosaic. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.

Data for Examples

As a reminder (see Overview of Summary Statistics), we will be using the penguins data from the palmerpenguins package:

library(palmerpenguins)

Here is a snippet of the data:

Palmer Penguins
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Chinstrap Dream 50.7 19.7 203 4050 male 2009
Gentoo Biscoe 45.1 14.4 210 4400 female 2008
Adelie Torgersen 33.5 19.0 190 3600 female 2008
Gentoo Biscoe 49.1 14.8 220 5150 female 2008
Chinstrap Dream 45.2 16.6 191 3250 female 2009

Mean

Sometimes called “the average” it is a summary of a quantitative variable observed within a sample. It measures the “fulcrum” or the balancing point of the data values and uses every value within its calculation.

\[\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i = 1}^{n}{x_i} \]

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a mean in R using the mean() function from the mosaic package.

mean(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm).

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above and rerun it. The warning will disappear and a numeric value (the mean) will be provided.

mean(~x, data = mydata, na.rm = TRUE)

Multiple Means Across Groups

When we want to calculate the mean of a quantitative variable (y) measured across the values/groups of a categorical variable (x) it is a simple modification to the single mean code.

mean(y ~ x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm and species from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm) and/or another categorical variable (e.g., sex).

Remember to add na.rm = TRUE if you get a warning.

Median

Also known as the 50th percentile and the 2nd quantile, the median is the value within the data such that roughly 50% of the other data points have a value greater than it and the other 50% of other data points have a value less than it. It is the middle value of the data.

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a median in R using the median() function from the mosaic package.

median(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the median, na.rm = TRUE. Add the argument to the code above

median(~x, data = mydata, na.rm = TRUE)

Multiple Medians Across Groups

When we want to calculate the median of a quantitative variable (y) measured across the values/groups of a categorical variable (x) it is a simple modification to the single median code.

median(y ~ x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm and species from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm) and/or another categorical variable (e.g., sex).

Remember to add na.rm = TRUE if you get a warning.