Measures of Center
Getting Started
First, be sure you have installed mosaic
. Remember, you only need to install each package once on your machine.
Then, be sure to load the packages mosaic
. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.
Data for Examples
As a reminder (see Overview of Data ation), we will be using the penguins
data from the palmerpenguins
package:
library(palmerpenguins)
Here is a snippet of the data:
Palmer Penguins | |||||||
---|---|---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
Gentoo | Biscoe | 49.6 | 15.0 | 216 | 4750 | male | 2008 |
Chinstrap | Dream | 47.0 | 17.3 | 185 | 3700 | female | 2007 |
Adelie | Dream | 38.1 | 17.6 | 187 | 3425 | female | 2009 |
Chinstrap | Dream | 49.6 | 18.2 | 193 | 3775 | male | 2009 |
Adelie | Dream | 37.8 | 18.1 | 193 | 3750 | male | 2009 |
Mean
Definition: MEAN
Sometimes called “the average” it is a summary of quantitative variables that measures
\[\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} = \frac{1}{n}\sum_{i = 1}^{n}{x_i} \]
Basic Code
For a single quantitative variable, x
, here is the general structure for calculating a mean in R using the mean()
function from the mosaic
package.
mean(~x, data = mydata)
Run the code below to see an example using the quantitative variable bill_length_mm
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
)
mean(~bill_length_mm, data = penguins)
[1] NA
Notice the returned value of NA
. The function needs to have another argument added that tells R to ignore missing values (NA
) in order to calculate the mean, na.rm = TRUE
. Add the argument to the code above
mean(~x, data = mydata, na.rm = TRUE)
Multiple Means Across Groups
When we want to calculate the mean of a quantitative variable (y
) measured across the values/groups of a categorical variable (x
)
Median
Definition: MEDIAN
Also known as the 50th percentile, the 2nd quantile, is the value within the data such that roughly 50% of the other data points have a value greater than it and the other 50% of other data points have a value less than it.
- Order the data from smallest to largest
Basic Code
For a single quantitative variable, x
, here is the general structure for calculating a mean in R using the median()
function from the mosaic
package.
median(~x, data = mydata)
Run the code below to see an example using the quantitative variable bill_length_mm
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
)
Notice the returned value of NA
. The function needs to have another argument added that tells R to ignore missing values (NA
) in order to calculate the mean, na.rm = TRUE
. Add the argument to the code above
median(~x, data = mydata, na.rm = TRUE)
Multiple Medians Across Groups
When we want to calculate the median of a quantitative variable (y
) measured across the values/groups of a categorical variable (x
)