gf_goal(~ x, data = mydata)
Overview for Data Visualization
Making Visualizations with ggformula
The package ggformula
is built to overlay the Grammar of Graphics used in package ggplot2
with a formula based syntax. The formula based syntax means that all ggformula
generated graphics will generally take one of the following forms:
One Variable
Two Variable - Relationships
gf_goal(y ~ x, data = mydata)
Two or More Variables Split by Groups
gf_goal(~ x | z, data = mydata)
gf_goal(y ~ x | z, data = mydata)
where the
goal
will be the specific function for the graph type (e.g.gf_histogram()
),- the
y
andx
will be the specific variables/columns in the data mapped to the y-axis and x-axis respectively, with| z
the feature that facets into different graphs for group values ofz
, and mydata
is the object name of your data.
Getting Started
First, be sure you have installed ggformula
. Remember, you only need to install the package once on your machine.
Then, be sure to load the package ggformula
. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.
library(ggformula) #for graphs
Common Modifications/Arguments
Every visualization type has some modifications that are specific to that type, but there are some universal modifications that should be added to every graph.
Axis Labels
Every visualization should have its axes labelled according to the context of the data. Axis labels should always include
- The variables or cases they represent
- Always include units (e.g. inches, %) for variables
- Always include units (e.g. inches, %) for variables
- The individual/sample/population represented by data
- A descriptive title - this should not be “X vs Y”, but something like “As X increases we see that Y decreases”
Here is how that is added to our gf_goal()
function:
gf_goal(formula, #takes many different forms
data = mydata,
xlab = "X Variable, Units, and/or Sample",
ylab = "Y Variable, Units, and/or Sample",
title = "Title describing relationship")
Adding ggplot2
Layers
The package ggformula
is built on top of another package called ggplot2
and so any ggplot2
function can be added to a ggformula
generated graphic.
For example, we can change the theme to a built-in theme or modify other features using +
after the ggformula
function to add the ggplot2
layers to the graph.
gf_goal(formula, #takes many different forms
data = mydata,
xlab = "X Variable, Units, and/or Sample",
ylab = "Y Variable, Units, and/or Sample",
title = "Title describing relationship") +
theme_light()
Data Structures
In order to make graphs, your data needs to be “tidy”. That means it should have the structure:
- Every Column is a Variable
- Every Row is an Individual/Case
- Every Cell is a Single Value
Here is an example of “tidy” data using data from a package called palmerpenguins
(remember to install it!). First, load the package.
library(palmerpenguins)
Here is a snippet of the data:
Palmer Penguins | |||||||
---|---|---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
Gentoo | Biscoe | 46.5 | 14.8 | 217 | 5200 | female | 2008 |
Chinstrap | Dream | 49.0 | 19.5 | 210 | 3950 | male | 2008 |
Gentoo | Biscoe | 48.8 | 16.2 | 222 | 6000 | male | 2009 |
Gentoo | Biscoe | 42.7 | 13.7 | 208 | 3950 | female | 2008 |
Adelie | Dream | 41.5 | 18.5 | 201 | 4000 | male | 2009 |
Chinstrap | Dream | 49.6 | 18.2 | 193 | 3775 | male | 2009 |
Adelie | Torgersen | 36.2 | 16.1 | 187 | 3550 | female | 2008 |
Adelie | Biscoe | 37.8 | 20.0 | 190 | 4250 | male | 2009 |
Adelie | Dream | 41.3 | 20.3 | 194 | 3550 | male | 2008 |
Adelie | Torgersen | 40.6 | 19.0 | 199 | 4000 | male | 2009 |
We notice that each column is a variable, such as
species
is the species of penguin
island
is the location/island on which the penguin is found
bill_length_mm
is the length of the penguin’s bill in millimeters (mm)
We also notice that each row represents a single penguin and its characteristics. Each cell contains a single value associated with a specific variable measured on a specific penguin.