#| label: setup
library(ggformula) #for graphs
Bar Plots
Getting Started
First, be sure you have installed ggformula
. Remember, you only need to install the package once on your machine.
Then, be sure to load the package ggformula
. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.
Data for Examples Show
As a reminder (see Overview of Data Visualization, we will be using the penguins
data from the palmerpenguins
package:
library(palmerpenguins)
Here is a snippet of the data:
Palmer Penguins | |||||||
---|---|---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
Gentoo | Biscoe | 46.4 | 15.0 | 216 | 4700 | female | 2008 |
Gentoo | Biscoe | 50.8 | 15.7 | 226 | 5200 | male | 2009 |
Adelie | Dream | 44.1 | 19.7 | 196 | 4400 | male | 2007 |
Chinstrap | Dream | 52.7 | 19.8 | 197 | 3725 | male | 2007 |
Gentoo | Biscoe | 45.3 | 13.7 | 210 | 4300 | female | 2008 |
Bar Plots with One Categorical Variable
Basic Code for Counts
For a single categorical variable, x
, here is the general structure for a bar plot for counts.
gf_bar(~x,
data = mydata)
Run the code below to see an example using the categorical variable species
from the penguins
data. Then replace species
with another categorical variable from the penguins
data (e.g. islands
)
#| label: barplot-one
#| warning: true
gf_bar(~species, data = penguins)
Notice that sometimes an extra bar is added for missing values, or NA
, such as for the sex
variable. NA simply indicates missing values in a data set. See Other Modifications to learn how to remove bars for missing values on a bar plot.
Basic Code for Proportions
To modify our plot so that we plot proportions, we will change the goal
of our basic bar plot code. For a single categorical variable, x
, here is the general structure for a bar plot for proportions.
gf_props(~x ,
data = mydata)
Change the goal
below to modify the bar plot to indicate the proportion instead of the count on the y-axis.
#| label: barplot-one-prop
#| warning: true
gf_bar(~species, data = penguins)
Basic Code for Percentages
To modify our plot so that we plot percentages, we will change the goal
of our basic bar plot code. For a single categorical variable, x
, here is the general structure for a bar plot for percentages.
gf_percents(~x ,
data = mydata)
Change the goal
below to modify the bar plot to indicate the percentage instead of the count on the y-axis.
#| label: barplot-one-perc
#| warning: true
gf_bar(~species, data = penguins)
Adding Labels
Descriptive labels are important for any visualization. We can always add them to any visualization by adding xlab =
to your function. Notice, for a single variable we do not add a y-axis label.
gf_bar(~x,
data = mydata,
xlab = "X Axis Label",
title = "Descriptive Title")
Add labels and a title to the bar plot for species
.
#| label: barplot-add-labels
gf_bar(~species,
data = penguins,
xlab = "______________",
title = "_____________")
Other Modifications Show
We can add a few other modifications that purely aesthetic - just to make our graphs look nicer or easier to read.
If NA
, or missing values, are being plotted and are not desired, you can drop the missing values from the variable you are plotting using the drop_na()
function from the set of tidyverse
packages.
library(tidyverse) #needed to use drop_na() function
gf_bar(~x,
data = drop_na(mydata, x))
Try this example with sex
for the penguins.
#| label: barplot-without-na
library(tidyverse)
gf_bar(~sex,
data = drop_na(penguins, sex))
We can add a color to fill the bars by telling R to fill the bars with a specified color either using a built in color from R or using a hex code for colors .
gf_bar(~x,
data = mydata,
xlab = "X Axis Label",
title = "Descriptive Title",
fill = "darkcyan")
The package ggformula
is built on top of another package called ggplot2
and so any ggplot2
function can be added to a ggformula
generated graphic. For example, we can change the theme to a built-in theme. To connect ggplot2
functions, we will use a +
at the end of the line preceding the function.
Try changing the theme to the following graph:
#| label: barplot-one-theme
gf_bar(~species,
data = penguins) + #notice the plus +
theme_light() #ggplot2 function
Bar Plots for Comparisons Across Groups
When we have a categorical variable that has been measured across multiple groups, we may be interested in comparing bar plots across the values/groups of another categorical variable. We can do this by adding another feature to our plot that represents the other categorical variable, such as a
- color
- facet
gf_boxplot(y ~ x,
data = mydata)
#we don't need to do anything to the y-axis now because it represents a variable here
The variable places in the y
position will be on the y-axis and the variable in the x
position will be on the x-axis. So we can change the orientation of our boxplot by switching the position of the quantitative and categorical variable.
Other Modifications for Comparisons Show
We can add a few other modifications that purely aesthetic - just to make our graphs look nicer or easier to read.
It is much easier to add jitter points to the boxplot across groups, we just pipe into the gf_jitter()
function. Modify the height =
argument to adjust the random position of the points on the y-axis.
#| label: two-var-boxplot-jitter
gf_boxplot(species ~ bill_length_mm,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information") |>
gf_jitter(height = 0.5)
Similar to changing the color of boxes to a single color, we can use the fill =
argument associated with the categorical variable.
Here is the boxplot of bill_length_mm
with fill color varied by species
a categorical variable with values of Adelie, Chinstrap, and Gentoo. Modify the code below to change the fill color to another categorical variable such as island
or sex
and see what happens.
#| label: two-var-boxplot-color
gf_boxplot(species ~ bill_length_mm,
fill = ~species,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information",
show.legend = FALSE) #hides unnecessary legend for fill
If we want to specify color and add jitter, we can do that too!
#| label: two-var-boxplot-color-jitter
gf_boxplot(species ~ bill_length_mm,
fill = ~species,
data = penguins,
xlab = "Add Variable X Information",
ylab = "Add Variable Y Information",
show.legend = FALSE,
alpha = 0.5) |> #makes the fill more transparent
gf_jitter(color = ~species,
height = 0.25,
show.legend = FALSE) #hides new legend for color