Measures of Center

Getting Started

First, be sure you have installed mosaic. Remember, you only need to install each package once on your machine.

Then, be sure to load the packages mosaic. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.

Data for Examples

As a reminder (see Overview of Data ation), we will be using the penguins data from the palmerpenguins package:

library(palmerpenguins)

Here is a snippet of the data:

Palmer Penguins
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Biscoe 38.1 17.0 181 3175 female 2009
Gentoo Biscoe 49.4 15.8 216 4925 male 2009
Gentoo Biscoe 42.6 13.7 213 4950 female 2008
Adelie Torgersen 41.8 19.4 198 4450 male 2008
Gentoo Biscoe 46.3 15.8 215 5050 male 2007

Mean

Definition: MEAN

Sometimes called “the average” it is a summary of quantitative variables that measures

\[\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i = 1}^{n}{x_i} \]

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a mean in R using the mean() function from the mosaic package.

mean(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

mean(~bill_length_mm, data = penguins)
[1] NA
Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above

mean(~x, data = mydata, na.rm = TRUE)

Multiple Means Across Groups

When we want to calculate the mean of a quantitative variable (y) measured across the values/groups of a categorical variable (x)

Median

Definition: MEDIAN

Also known as the 50th percentile, the 2nd quantile, is the value within the data such that roughly 50% of the other data points have a value greater than it and the other 50% of other data points have a value less than it.

  1. Order the data from smallest to largest

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a mean in R using the median() function from the mosaic package.

median(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above

median(~x, data = mydata, na.rm = TRUE)

Multiple Medians Across Groups

When we want to calculate the median of a quantitative variable (y) measured across the values/groups of a categorical variable (x)