#| label: setup
library(ggformula) #for graphs
Boxplots
Getting Started
First, be sure you have installed ggformula
. Remember, you only need to install the package once on your machine.
Then, be sure to load the package ggformula
. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.
Data for Examples Show
As a reminder (see Overview of Data Visualization), we will be using the penguins
data from the palmerpenguins
package:
library(palmerpenguins)
Here is a snippet of the data:
Palmer Penguins | |||||||
---|---|---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
Gentoo | Biscoe | 45.1 | 14.5 | 215 | 5000 | female | 2007 |
Adelie | Torgersen | 34.4 | 18.4 | 184 | 3325 | female | 2007 |
Gentoo | Biscoe | 47.3 | 15.3 | 222 | 5250 | male | 2007 |
Chinstrap | Dream | 45.5 | 17.0 | 196 | 3500 | female | 2008 |
Chinstrap | Dream | 50.3 | 20.0 | 197 | 3300 | male | 2007 |
Boxplots with One Quantitative Variable
Basic Code
For a single quantitative variable, x
, here is the general structure for a boxplot.
gf_boxplot(~x,
data = mydata)
Notice that a y-axis is added, but it does not have any meaning. We can remove it using the function gf_theme()
gf_boxplot(~x,
data = mydata) |> #this is a pipe, another option is %>%
gf_theme(axis.ticks.y = element_blank(), #removes y-axis ticks
axis.text.y = element_blank()) #removes y-axis labels
Notice the use of the native pipe |>
to connect the two functions from ggformula
. Another version of a pipe in R is %>%
. Both will work, it is just that %>%
relies on another package and |>
is built into base R.
Run the code below to see an example using the quantitative variable bill_length_mm
from the penguins
data. Then replace bill_length_mm
with another quantitative variable from the penguins
data (e.g. bill_depth_mm
)
#| label: boxplot-one
#| warning: true
gf_boxplot(~bill_length_mm,
data = penguins) |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
Notice the warning produced from running the code. This is just a warning that there were rows (penguins) ignored due to missing data for the variables visualized. A Warning is simply R communicating a decision it made without your consent. The code still works.
Adding Labels
Descriptive labels are important for any visualization. We can always add them to any visualization by adding xlab =
to your function. Notice, for a single variable we do not add a y-axis label.
gf_boxplot(~x,
data = mydata,
xlab = "X Axis Label",
title = "Descriptive Title") |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
Add labels and a title to the boxplot for bill_length_mm
.
#| label: boxplot-add-labels
gf_boxplot(~bill_length_mm,
data = penguins,
xlab = "______________",
title = "_____________") |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
Other Modifications Show
We can add a few other modifications that purely aesthetic - just to make our graphs look nicer or easier to read.
Boxplots often hide the amount of data they represent (the number of cases). To combat this, we can add the data points to the boxplot using the gf_jitter()
function. You can modify the height =
argument to change the distance between points. Notice the 0
which centers the points on the y-axis at 0.
gf_boxplot(~x,
data = mydata) |>
gf_jitter(0 ~ x,
data = mydata,
height = 1) |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
Modify the values for height =
argument and see how they affect the point distribution on the y-axis. Remember, the y-axis has no meaning here, so the heights of the points are random and not representative of any element of the data. Just the position on the x-axis is meaningful.
#| label: boxplot-add-jitter
gf_boxplot(~bill_length_mm,
data = penguins) |>
gf_jitter(0 ~ bill_length_mm,
data = penguins,
height = 0.25) |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
We can add a color to fill the boxes by telling R to fill the boxes with a specified color either using a built in color from R or using a hex code for colors .
gf_boxplot(~x,
data = mydata,
xlab = "X Axis Label",
title = "Descriptive Title",
fill = "darkcyan") |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
The package ggformula
is built on top of another package called ggplot2
and so any ggplot2
function can be added to a ggformula
generated graphic. For example, we can change the theme to a built-in theme. To connect ggplot2
functions, we will use a +
at the end of the line preceding the function.
Try changing the theme to the following graph:
#| label: boxplot-one-theme
gf_boxplot(~bill_length_mm,
data = penguins) |>
gf_theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank()) + #notice the plus +
theme_light() #ggplot2 function
Boxplots for Comparisons Across Groups
When we have a quantitative variable that has been measured across multiple groups, we may be interested in comparing boxplots across the values/groups of a categorical variable. We can do this by adding a y-axis variable that represents the categorical variable.
gf_boxplot(y ~ x,
data = mydata)
#we don't need to do anything to the y-axis now because it represents a variable here
The variable places in the y
position will be on the y-axis and the variable in the x
position will be on the x-axis. So we can change the orientation of our boxplot by switching the position of the quantitative and categorical variable.
Here is an example with species
on the y-axis. Try modifying the code so that species
is on the x-axis and bill_length_mm
is on the y-axis. Then modify the axes labels.
#| label: two-var-boxplot
gf_boxplot(species ~ bill_length_mm,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information")
Other Modifications for Comparisons Show
We can add a few other modifications that purely aesthetic - just to make our graphs look nicer or easier to read.
It is much easier to add jitter points to the boxplot across groups, we just pipe into the gf_jitter()
function. Modify the height =
argument to adjust the random position of the points on the y-axis.
#| label: two-var-boxplot-jitter
gf_boxplot(species ~ bill_length_mm,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information") |>
gf_jitter(height = 0.5)
Similar to changing the color of boxes to a single color, we can use the fill =
argument associated with the categorical variable.
Here is the boxplot of bill_length_mm
with fill color varied by species
a categorical variable with values of Adelie, Chinstrap, and Gentoo. Modify the code below to change the fill color to another categorical variable such as island
or sex
and see what happens.
#| label: two-var-boxplot-color
gf_boxplot(species ~ bill_length_mm,
fill = ~species,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information",
show.legend = FALSE) #hides unnecessary legend for fill
If we want to specify color and add jitter, we can do that too!
#| label: two-var-boxplot-color-jitter
gf_boxplot(species ~ bill_length_mm,
fill = ~species,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information",
show.legend = FALSE,
alpha = 0.5) |> #makes the fill more transparent
gf_jitter(color = ~species,
height = 0.25,
show.legend = FALSE) #hides new legend for color