Measures of Center – How to do it in R

Getting Started

First, be sure you have installed mosaic. Remember, you only need to install each package once on your machine.

Then, be sure to load the packages mosaic. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.

Data for Examples Show

As a reminder (see Overview of Summary Statistics), we will be using the penguins data from the palmerpenguins package:

library(palmerpenguins)

Here is a snippet of the data:

Palmer Penguins
species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
Chinstrap	Dream	50.7	19.7	203	4050	male	2009
Gentoo	Biscoe	45.1	14.4	210	4400	female	2008
Adelie	Torgersen	33.5	19.0	190	3600	female	2008
Gentoo	Biscoe	49.1	14.8	220	5150	female	2008
Chinstrap	Dream	45.2	16.6	191	3250	female	2009

Mean

Definition: MEAN

Sometimes called “the average” it is a summary of a quantitative variable observed within a sample. It measures the “fulcrum” or the balancing point of the data values and uses every value within its calculation.

\[\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i = 1}^{n}{x_i} \]

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a mean in R using the mean() function from the mosaic package.

mean(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm).

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the mean, na.rm = TRUE. Add the argument to the code above and rerun it. The warning will disappear and a numeric value (the mean) will be provided.

mean(~x, data = mydata, na.rm = TRUE)

Multiple Means Across Groups

When we want to calculate the mean of a quantitative variable (y) measured across the values/groups of a categorical variable (x) it is a simple modification to the single mean code.

mean(y ~ x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm and species from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm) and/or another categorical variable (e.g., sex).

Remember to add na.rm = TRUE if you get a warning.

Median

Definition: MEDIAN

Also known as the 50th percentile and the 2nd quantile, the median is the value within the data such that roughly 50% of the other data points have a value greater than it and the other 50% of other data points have a value less than it. It is the middle value of the data.

Basic Code

For a single quantitative variable, x, here is the general structure for calculating a median in R using the median() function from the mosaic package.

median(~x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm)

Handling Missing Values

Notice the returned value of NA. The function needs to have another argument added that tells R to ignore missing values (NA) in order to calculate the median, na.rm = TRUE. Add the argument to the code above

median(~x, data = mydata, na.rm = TRUE)

Multiple Medians Across Groups

When we want to calculate the median of a quantitative variable (y) measured across the values/groups of a categorical variable (x) it is a simple modification to the single median code.

median(y ~ x, data = mydata)

Run the code below to see an example using the quantitative variable bill_length_mm and species from the penguins data. Then replace bill_length_mm with another quantitative variable from the penguins data (e.g. bill_depth_mm) and/or another categorical variable (e.g., sex).

Remember to add na.rm = TRUE if you get a warning.