Bar Plots

Getting Started

First, be sure you have installed ggformula. Remember, you only need to install the package once on your machine.

Then, be sure to load the package ggformula. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.

#| label: setup
library(ggformula) #for graphs

Data for Examples

As a reminder (see Overview of Data Visualization, we will be using the penguins data from the palmerpenguins package:

library(palmerpenguins)

Here is a snippet of the data:

Palmer Penguins
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Gentoo Biscoe 46.4 15.0 216 4700 female 2008
Gentoo Biscoe 50.8 15.7 226 5200 male 2009
Adelie Dream 44.1 19.7 196 4400 male 2007
Chinstrap Dream 52.7 19.8 197 3725 male 2007
Gentoo Biscoe 45.3 13.7 210 4300 female 2008

Bar Plots with One Categorical Variable

Basic Code for Counts

For a single categorical variable, x, here is the general structure for a bar plot for counts.

gf_bar(~x, 
       data = mydata)

Run the code below to see an example using the categorical variable species from the penguins data. Then replace species with another categorical variable from the penguins data (e.g. islands)

#| label: barplot-one
#| warning: true
gf_bar(~species, data = penguins)

Notice that sometimes an extra bar is added for missing values, or NA, such as for the sex variable. NA simply indicates missing values in a data set. See Other Modifications to learn how to remove bars for missing values on a bar plot.

Basic Code for Proportions

To modify our plot so that we plot proportions, we will change the goal of our basic bar plot code. For a single categorical variable, x, here is the general structure for a bar plot for proportions.

gf_props(~x , 
         data = mydata)

Change the goal below to modify the bar plot to indicate the proportion instead of the count on the y-axis.

#| label: barplot-one-prop
#| warning: true
gf_bar(~species, data = penguins)

Basic Code for Percentages

To modify our plot so that we plot percentages, we will change the goal of our basic bar plot code. For a single categorical variable, x, here is the general structure for a bar plot for percentages.

gf_percents(~x , 
         data = mydata)

Change the goal below to modify the bar plot to indicate the percentage instead of the count on the y-axis.

#| label: barplot-one-perc
#| warning: true
gf_bar(~species, data = penguins)

Adding Labels

Descriptive labels are important for any visualization. We can always add them to any visualization by adding xlab = to your function. Notice, for a single variable we do not add a y-axis label.

gf_bar(~x, 
       data = mydata,
       xlab = "X Axis Label",
       title = "Descriptive Title")

Add labels and a title to the bar plot for species.

#| label: barplot-add-labels
gf_bar(~species, 
       data = penguins,
       xlab = "______________",
       title = "_____________") 

Other Modifications

We can add a few other modifications that purely aesthetic - just to make our graphs look nicer or easier to read.

If NA, or missing values, are being plotted and are not desired, you can drop the missing values from the variable you are plotting using the drop_na() function from the set of tidyverse packages.

library(tidyverse) #needed to use drop_na() function
gf_bar(~x, 
       data = drop_na(mydata, x)) 

Try this example with sex for the penguins.

#| label: barplot-without-na
library(tidyverse)
gf_bar(~sex, 
       data = drop_na(penguins, sex))

We can add a color to fill the bars by telling R to fill the bars with a specified color either using a built in color from R or using a hex code for colors .

gf_bar(~x, 
       data = mydata,
       xlab = "X Axis Label",
       title = "Descriptive Title",
       fill = "darkcyan") 

The package ggformula is built on top of another package called ggplot2 and so any ggplot2 function can be added to a ggformula generated graphic. For example, we can change the theme to a built-in theme. To connect ggplot2 functions, we will use a + at the end of the line preceding the function.

Try changing the theme to the following graph:

#| label: barplot-one-theme
gf_bar(~species, 
       data = penguins) + #notice the plus +
  theme_light() #ggplot2 function

Bar Plots for Comparisons Across Groups

When we have a categorical variable that has been measured across multiple groups, we may be interested in comparing bar plots across the values/groups of another categorical variable. We can do this by adding another feature to our plot that represents the other categorical variable, such as a

  • color
  • facet
gf_boxplot(y ~ x, 
           data = mydata) 
#we don't need to do anything to the y-axis now because it represents a variable here 
Switching Box Plot Orientation

The variable places in the y position will be on the y-axis and the variable in the x position will be on the x-axis. So we can change the orientation of our boxplot by switching the position of the quantitative and categorical variable.

Other Modifications for Comparisons

We can add a few other modifications that purely aesthetic - just to make our graphs look nicer or easier to read.

It is much easier to add jitter points to the boxplot across groups, we just pipe into the gf_jitter() function. Modify the height = argument to adjust the random position of the points on the y-axis.

#| label: two-var-boxplot-jitter
gf_boxplot(species ~ bill_length_mm, 
           data = penguins,
           xlab = "Add Variable X Information",
           ylab = "Add Variable Y Information") |> 
  gf_jitter(height = 0.5)

Similar to changing the color of boxes to a single color, we can use the fill = argument associated with the categorical variable.

Here is the boxplot of bill_length_mm with fill color varied by species a categorical variable with values of Adelie, Chinstrap, and Gentoo. Modify the code below to change the fill color to another categorical variable such as island or sex and see what happens.

#| label: two-var-boxplot-color
gf_boxplot(species ~ bill_length_mm, 
           fill = ~species,
           data = penguins,
           xlab = "Add Variable X Information",
           ylab = "Add Variable Y Information",
           show.legend = FALSE) #hides unnecessary legend for fill

If we want to specify color and add jitter, we can do that too!

#| label: two-var-boxplot-color-jitter
gf_boxplot(species ~ bill_length_mm, 
           fill = ~species,
           data = penguins,
           xlab = "Add Variable X Information",
           ylab = "Add Variable Y Information",
           show.legend = FALSE,
           alpha = 0.5) |> #makes the fill more transparent 
  gf_jitter(color = ~species,
            height = 0.25,
            show.legend = FALSE) #hides new legend for color