Measures of Center
Getting Started
First, be sure you have installed mosaic
. Remember, you only need to install each package once on your machine.
Then, be sure to load the packages mosaic
. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.
Data for Examples Show
As a reminder (see Overview of Summary Statistics), we will be using the penguins
data from the palmerpenguins
package:
library(palmerpenguins)
Here is a snippet of the data:
Palmer Penguins | |||||||
---|---|---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
Chinstrap | Dream | 50.7 | 19.7 | 203 | 4050 | male | 2009 |
Gentoo | Biscoe | 45.1 | 14.4 | 210 | 4400 | female | 2008 |
Adelie | Torgersen | 33.5 | 19.0 | 190 | 3600 | female | 2008 |
Gentoo | Biscoe | 49.1 | 14.8 | 220 | 5150 | female | 2008 |
Chinstrap | Dream | 45.2 | 16.6 | 191 | 3250 | female | 2009 |
Mean
Sometimes called “the average” it is a summary of a quantitative variable observed within a sample. It measures the “fulcrum” or the balancing point of the data values and uses every value within its calculation.
\[\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i = 1}^{n}{x_i} \]
Basic Code
For a single quantitative variable, x
, here is the general structure for calculating a mean in R using the mean()
function from the mosaic
package.
mean(~x, data = mydata)
Run the code below to see an example using the quantitative variable bill_length_mm
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
).
Notice the returned value of NA
. The function needs to have another argument added that tells R to ignore missing values (NA
) in order to calculate the mean, na.rm = TRUE
. Add the argument to the code above and rerun it. The warning will disappear and a numeric value (the mean) will be provided.
mean(~x, data = mydata, na.rm = TRUE)
Multiple Means Across Groups
When we want to calculate the mean of a quantitative variable (y
) measured across the values/groups of a categorical variable (x
) it is a simple modification to the single mean code.
mean(y ~ x, data = mydata)
Run the code below to see an example using the quantitative variable bill_length_mm
and species
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
) and/or another categorical variable (e.g., sex
).
Remember to add na.rm = TRUE
if you get a warning.
Median
Also known as the 50th percentile and the 2nd quantile, the median is the value within the data such that roughly 50% of the other data points have a value greater than it and the other 50% of other data points have a value less than it. It is the middle value of the data.
Basic Code
For a single quantitative variable, x
, here is the general structure for calculating a median in R using the median()
function from the mosaic
package.
median(~x, data = mydata)
Run the code below to see an example using the quantitative variable bill_length_mm
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
)
Notice the returned value of NA
. The function needs to have another argument added that tells R to ignore missing values (NA
) in order to calculate the median, na.rm = TRUE
. Add the argument to the code above
median(~x, data = mydata, na.rm = TRUE)
Multiple Medians Across Groups
When we want to calculate the median of a quantitative variable (y
) measured across the values/groups of a categorical variable (x
) it is a simple modification to the single median code.
median(y ~ x, data = mydata)
Run the code below to see an example using the quantitative variable bill_length_mm
and species
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
) and/or another categorical variable (e.g., sex
).
Remember to add na.rm = TRUE
if you get a warning.