Measures of Center

Getting Started

First, be sure you have installed mosaic. Remember, you only need to install each package once on your machine.

Then, be sure to load the packages mosaic. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.

Data for Examples

As a reminder (see Overview of Data ation), we will be using the penguins data from the palmerpenguins package:

library(palmerpenguins)

Here is a snippet of the data:

Palmer Penguins
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Gentoo Biscoe 41.7 14.7 210 4700 female 2009
Gentoo Biscoe NA NA NA NA NA 2009
Chinstrap Dream 50.9 19.1 196 3550 male 2008
Adelie Torgersen NA NA NA NA NA 2007
Gentoo Biscoe 50.1 15.0 225 5000 male 2008

Mean

Definition: MEAN

Sometimes called “the average” it is a summary of quantitative variables that measures

\[\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i = 1}^{n}{x_i} \]

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a mean in R using the mean() function from the mosaic package.

mean(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

mean(~bill_length_mm, data = penguins)
[1] NA
Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above

mean(~x, data = mydata, na.rm = TRUE)

Multiple Means Across Groups

When we want to calculate the mean of a quantitative variable (y) measured across the values/groups of a categorical variable (x)

Median

Definition: MEDIAN

Also known as the 50th percentile, the 2nd quantile, is the value within the data such that roughly 50% of the other data points have a value greater than it and the other 50% of other data points have a value less than it.

  1. Order the data from smallest to largest

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a mean in R using the median() function from the mosaic package.

median(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above

median(~x, data = mydata, na.rm = TRUE)

Multiple Medians Across Groups

When we want to calculate the median of a quantitative variable (y) measured across the values/groups of a categorical variable (x)