Lab 3: Exploring Rodents with ggplot2

Author

Instructions

1 Part 1: Setup

1.1 GitHub Workflow

Set up your GitHub workflow (either using the method of creating a repository and using Version Control to set up your project or vice versa using the usethis package commands we have learned).

Use appropriate naming conventions for your project (see Code Style Guide), e.g. lab-3-ggplot2.

Your project folder should contain the following:

You will submit a link to your GitHub repository with all content.

1.2 Seeking Help

Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it.

  • If you used Google, you are expected to “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.

  • If you used ChatGPT, you are expected to “inform” me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) downloading and attaching the .txt file containing your entire conversation with ChatGPT. ChatGPT can we used as a “search engine”, but you should not copy and paste prompts from the lab or the code into your lab.

Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.

1.3 Lab Instructions

The questions in this lab are noted with numbers and boldface. Each question will require you to produce code, whether it is one line or multiple lines.

This document is quite plain, meaning it does not have any special formatting. As part of your demonstration of creating professional looking Quarto documents, I would encourage you to spice your documents up (e.g., declaring execution options, specifying how your figures should be output, formatting your code output, etc.).

1.4 Setup

In the code chunk below, load in the packages necessary for your analysis. You should only need the tidyverse package for this analysis.

2 Part 2: Data Context

The Portal Project is a long-term ecological study being conducted near Portal, AZ. Since 1977, the site has been used to study the interactions among rodents, ants, and plants, as well as their respective responses to climate. To study the interactions among organisms, researchers experimentally manipulated access to 24 study plots. This study has produced over 100 scientific papers and is one of the longest running ecological studies in the U.S.

We will be investigating the animal species diversity and weights found within plots at the Portal study site. The data are stored as a comma separated value (CSV) file. Each row holds information for a single animal, and the columns represent:

Column Description
record_id Unique ID for the observation
month month of observation
day day of observation
year year of observation
plot_id ID of a particular plot
species_id 2-letter code
sex sex of animal (“M”, “F”)
hindfoot_length length of the hindfoot in mm
weight weight of the animal in grams
genus genus of animal
species species of animal
taxon e.g. Rodent, Reptile, Bird, Rabbit
plot_type type of plot

2.1 Reading the Data into R

We are going to use the read_csv() function to load in the surveys.csv dataset (stored in the data folder). For simplicity, name the data surveys. We will learn more about this function next week.

surveys <- read_csv("data/surveys.csv")
#glimpse(surveys)

1. What are the dimensions (# of rows and columns) of these data?

2. What are the data types of the variables in this dataset?

3 Part 3: Exploratory Data Analysis with ggplot2

ggplot() graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.

To build a ggplot(), we will use the following basic template that can be used for different types of plots:

ggplot(data = <DATA>,
       mapping = aes(<VARIABLE MAPPINGS>)) +
  <GEOM_FUNCTION>()

Let’s get started!

3.1 Scatterplot

3. First, let’s create a scatterplot of the relationship between weight (on the \(x\)-axis) and hindfoot_length (on the \(y\)-axis).

We can see there are a lot of points plotted on top of each other. Let’s try and modify this plot to extract more information from it.

4. Let’s add transparency (alpha) to the points, to make the points more transparent and (possibly) easier to see.

Despite our best efforts there is still a substantial amount of overplotting occurring in our scatterplot. Let’s try splitting the dataset into smaller subsets and see if that allows for us to see the trends a bit better.

6. Facet your scatterplot by species.

7. No plot is complete without axis labels and a title. Include reader friendly labels and a title to your plot.

It takes a larger cognitive load to read text that is rotated. It is common practice in many journals and media outlets to move the \(y\)-axis label to the top of the graph under the title.

8. Specify your \(y\)-axis label to be empty and move the \(y\)-axis label into the subtitle.

3.2 Boxplots

9. Create side-by-side boxplots to visualize the distribution of weight within each species.

A fundamental complaint of boxplots is that they do not plot the raw data. However, with ggplot we can add the raw points on top of the boxplots!

10. Add another layer to your previous plot that plots each observation using geom_point().

Alright, this should look less than optimal. Your points should appear rather stacked on top of each other. To make them less stacked, we need to jitter them a bit, using geom_jitter().

11. Remove the previous layer and include a geom_jitter() layer instead.

That should look a bit better! But its really hard to see the points when everything is black.

12. Set the color aesthetic in geom_jitter() to change the color of the points and add set the alpha aesthetic to add transparency. You are welcome to use whatever color you wish! Some of my favorites are “springgreen4” and “steelblue4”. Check them out on R Charts

Great! Now that you can see the points, you should notice something odd: there are two colors of points still being plotted. Some of the observations are being plotted twice, once from geom_boxplot() as outliers and again from geom_jitter()!

13. Inspect the help file for geom_boxplot() and see how you can remove the outliers from being plotted by geom_boxplot(). Make this change in your code!

Some small changes can make big differences to plots. One of these changes are better labels for a plot’s axes and legend.

14. Modify the \(x\)-axis and \(y\)-axis labels to describe what is being plotted. Be sure to include any necessary units! You might also be getting overlap in the species names – use theme(axis.text.x = ____) or theme(axis.text.y = ____) to turn the species axis labels 45 degrees.

Some people (and journals) prefer for boxplots to be stacked with a specific orientation! Let’s practice changing the orientation of our boxplots.

15. Now copy-paste your boxplot code you’ve been adding to above. Flip the orientation of your boxplots. If you created horizontally stacked boxplots, your boxplots should now be stacked vertically. If you had vertically stacked boxplots, you should now stack your boxplots horizontally!

Notice how vertically stacked boxplots make the species labels more readable than horizontally stacked boxplots (even when the axis labels are rotated). This is good practice!

4 Lab 3 Submission

For Lab 3 you will submit the link to your GitHub repository. Your rendered file is required to have the following specifications in the YAML options (at the top of your document):

  • have the plots embedded (embed-resources: true)
  • include your source code (code-tools: true)
  • include all your code and output (echo: true)

If any of the options are not included, your Lab 3 or Challenge 3 assignment will receive an “Incomplete” and you will be required to submit a revision.

In addition, your document should not have any warnings or messages output in your HTML document. If your HTML contains warnings or messages, you will receive an “Incomplete” for document formatting and you will be required to submit a revision.

5 Challenge Problems

For this week’s Challenge, you will have three different options to explore. I’ve arranged these options in terms of their “spiciness,” or the difficulty of completing the task. You only need to complete one task to complete the challenge, but if you are interested in exploring more than one, feel free!

This is a great place to let your creativity show! Make sure to indicate what additional touches you added, and provide any online references you used.

5.1 🌶 Medium: Ridge Plots

In Lab 3, you used side-by-side boxplots to visualize the distribution of weight within each species of rodent. Boxplots have substantial flaws, namely that they disguise distributions with multiple modes.

A “superior” alternative is the density plot. However, ggplot2 does not allow for side-by-side density plots using geom_density(). Instead, we will need to make use of the ggridges package to create side-by-side density (ridge) plots.

For this challenge you are to change your boxplots to ridge plots. You will need to install the ggridges package and explore the geom_density_ridges() function.

5.2 🌶 🌶 Spicy: Exploring Color Themes

The built-in ggplot() color scheme may not include the colors you were looking for. Don’t worry – there are many other color palettes available to use!

You can change the colors used by ggplot() in a few different ways.

Manual Specification

Add the scale_color_manual() or scale_fill_manual() functions to your plot and directly specify the colors you want to use. You can either:

  1. define a vector of colors within the scale functions (e.g. values = c("blue", "black", "red", "green")) OR

  2. create a vector of colors using hex numbers and store that vector as a variable. Then, call that vector in the scale_color_manual() function.

Here are some example hex color schemes:

# A vector of a RG color deficient friendly palette with gray:
cdPalette_grey <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# A vector of a RG color deficient friendly palette with black:
cdPalette_blk <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
Note

If you are interested in using specific hex colors, I like the image color picker app to find the colors I want.

Package Specification

While manual specification may be necessary for some contexts, it can be a real pain to handpick 5+ colors. This is where color scales built-in to R packages come in handy! Popular packages for colors include:

  • RColorBrewer – change colors by using scale_fill_brewer() or scale_colour_brewer().

  • viridis – change colors by using scale_colour_viridis_d() for discrete data, scale_colour_viridis_c() for continuous data.

  • ggsci – change colors by using scale_color_<PALNAME>() or scale_fill_<PALNAME>(), where you specify the name of the palette you wish to use (e.g. scale_color_aaas()).

Note

This website provides an exhaustive list of color themes available through various packages.

In this challenge you are expected to use this information to modify the boxplots you created Lab 3. First, you are to color the boxplots based on the variable genus. Next, you are to change the colors used for genus using either manual color specification or any of the packages listed here (or others!).

5.3 🌶 🌶 🌶 Hot: Exploring ggplot2 Annotation

Some data scientists advocate that we should try to eliminate legends from our plots to make them more clear. Instead of using legends, which cause the reader’s eye to stray from the plot, we should use annotation.

We can add annotation(s) to a ggplot() using the annotate() function:

ggplot(data = surveys, 
       mapping = aes(x = weight, y = species, color = genus)
       ) +
  geom_boxplot() +
  scale_color_manual(values = cdPalette_grey) + 
  annotate("text", y = 6, x = 200, label = "Sigmodon") +
  annotate("text", y = 4, x = 200, label = "Perognathus") +
  theme(legend.position = "none") +
  labs(x = "Weight (g)",
       y = "",
       subtitle = "by Species and Genera",
       title = "Rodent Weight")

Note that I’ve labeled the “Sigmodon” and “Perognathus” genera, so the reader can tell that these boxplots are associated with their respective genus.

In this challenge you are expected to use this information to modify the boxplots you created in Lab 3. First, you are to color the boxplots based on the variable genus. Next, you are to add annotations for each genus next to the boxplot(s) associated with that genus. Finally, you are expected to use the theme() function to remove the color legend from the plot, since it is no longer needed!