gf_goal(~ x, data = mydata)
Overview for Summary Statistics
Calculating Summary Statistics with mosaic
The package mosaic
is built to overlay the existing functions used in base R packages to be easier to use with a formula based syntax. The formula based syntax means that all mosaic
generated summary statistics the code will generally take one of the following forms:
One Variable
Two Variable - Relationships
gf_goal(y ~ x, data = mydata)
where the
goal
will be the specific function for the graph type (e.g.mean()
),- the
y
andx
will be the specific variables/columns in the data and may be categorical or quantitative depending on the statistic type, and mydata
is the object name of your data.
Getting Started
First, be sure you have installed mosaic
. Remember, you only need to install the package once on your machine.
Then, be sure to load the package mosaic
. Remember, you need to do this with each new Quarto/RMarkdown document or R Session.
library(mosaic) #for summary stats
Data Structures
In order to make graphs, your data needs to be “tidy”. That means it should have the structure:
- Every Column is a Variable
- Every Row is an Individual/Case
- Every Cell is a Single Value
Here is an example of “tidy” data using data from a package called palmerpenguins
(remember to install it!). First, load the package.
library(palmerpenguins)
Here is a snippet of the data:
Palmer Penguins | |||||||
---|---|---|---|---|---|---|---|
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
Adelie | Biscoe | 42.0 | 19.5 | 200 | 4050 | male | 2008 |
Adelie | Torgersen | 42.5 | 20.7 | 197 | 4500 | male | 2007 |
Adelie | Torgersen | 36.6 | 17.8 | 185 | 3700 | female | 2007 |
Chinstrap | Dream | 46.4 | 18.6 | 190 | 3450 | female | 2007 |
Chinstrap | Dream | 45.2 | 17.8 | 198 | 3950 | female | 2007 |
Gentoo | Biscoe | 47.4 | 14.6 | 212 | 4725 | female | 2009 |
Gentoo | Biscoe | 49.1 | 14.8 | 220 | 5150 | female | 2008 |
Gentoo | Biscoe | 45.8 | 14.6 | 210 | 4200 | female | 2007 |
Chinstrap | Dream | 52.0 | 20.7 | 210 | 4800 | male | 2008 |
Adelie | Torgersen | 42.0 | 20.2 | 190 | 4250 | NA | 2007 |
We notice that each column is a variable, such as
species
is the species of penguin
island
is the location/island on which the penguin is found
bill_length_mm
is the length of the penguin’s bill in millimeters (mm)
We also notice that each row represents a single penguin and its characteristics. Each cell contains a single value associated with a specific variable measured on a specific penguin.