<- read_csv("data/unvotes.csv") unvotes
PA 6: United Nations Voting Records
This task is complex. It requires many different types of abilities. Everyone will be good at some of these abilities but nobody will be good at all of them. In order to produce the best product possible, you will need to use the skills of each member of your group.
Note: Since the project folder is already shared as a zip file, using the commands:
usethis::use_git()
usethis::use_github()
might be easier to initiate your github repository.
Goals for the Activity
- Join multiple data tables together by a common variable(s)
- Create new data sets through the joining of data from various sources
- Combine
join
functions with othertidyverse
functions
THROUGHOUT THE Activity be sure to follow the Style Guide by doing the following:
- load the appropriate packages at the beginning of the Quarto document
- use proper spacing
- add labels to all code chunks
- comment at least once in each code chunk to describe why you made your coding decisions
- add appropriate labels to all graphic axes
Data Description
The data this week comes from Harvard’s Dataverse by way of Mine Çetinkaya-Rundel, David Robinson, and Nicholas Goguen-Compagnoni.
Original Data citation: Erik Voeten “Data and Analyses of Voting in the UN General Assembly” Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013). Available at SSRN: http://ssrn.com/abstract=2111149
It was featured on TidyTuesday
Here is each data set and its description (you might want to look at the Tidy Tuesday link for the tables already rendered)
unvotes.csv
variable | class | description |
---|---|---|
rcid | double | The roll call id; used to join with un_votes and un_roll_call_issues |
country | character | Country name, by official English short name |
country_code | character | 2-character ISO country code |
vote | integer | Vote result as a factor of yes/abstain/no |
roll_calls.csv
variable | class | description |
---|---|---|
rcid | integer | . |
session | double | Session number. The UN holds one session per year; these started in 1946 |
importantvote | integer | Whether the vote was classified as important by the U.S. State Department report “Voting Practices in the United Nations”. These classifications began with session 39 |
date | double | Date of the vote, as a Date vector |
unres | character | Resolution code |
amend | integer | Whether the vote was on an amendment; coded only until 1985 |
para | integer | Whether the vote was only on a paragraph and not a resolution; coded only until 1985 |
short | character | Short description |
descr | character | Longer description |
#edit to read in roll_calls
<- read_csv("__________") roll_calls
issues.csv
variable | class | description |
---|---|---|
rcid | integer | The roll call id; used to join with unvotes and un_roll_calls |
short_name | character | Two-letter issue codes |
issue | integer | Descriptive issue name |
#edit to read in issues
Data Exploration
Our goal today is to explore how various members (countries) in the United Nations vote. We have three data sets, what can we determine from each data set separately?
UN Votes
The first data set, unvotes
contains data on the rcid
which is the roll call id for the vote, the country/country code, and how the country voted. What can we learn from the data?
Comment on the following code, what is happening in each line? One way to approach seeing what each line does is to highlight the code from before the pipe
of that line up to the data unvotes
and use CTRL + ENTER
to run just the highlighted lines.
|>
unvotes count(country, vote) |> #comment
group_by(country) |> #comment
mutate(total = sum(n)) |> #comment
mutate(prop_vote = n/total) |> #comment
filter(country %in% c("United States", "Canada",
"Germany", "France",
"Italy", "Japan",
"United Kingdom")) |> #comment
ggplot(aes(x = country, y = prop_vote,
fill = vote)) + #comment
geom_col(position = position_stack()) + #comment
labs(x = "Group of Seven Countries",
y = "Proportion of Votes",
title = "Voting Record of the G7",
fill = "Vote") + #comment
theme_minimal() + #comment
scale_fill_viridis_d(end = 0.8) + #comment
coord_flip() #comment
Describe what the graph above demonstrates above UN voting records for the G7
Roll Calls
The second data set, roll_calls
has more information on the type of vote, the importance, whether it was a resolution, and date of the vote. What does eh following code do?
|>
roll_calls distinct(short)
Description of code results
We can use the individual data for roll_calls
to look at the number of votes per year over time.
|>
roll_calls mutate(year = lubridate::year(date)) |> #extracts the year from the date value and creates a new `year` column
count(year) |> #counts how many votes there were per year assuming each line is an single voting instance
ggplot(aes(x = year, y = n)) +
geom_line() +
labs(x = "Year", y = "Number of Votes",
title = "UN Votes per Year") +
theme_minimal()
What information is missing from the above graphic that might be useful in understanding the issues the UN commonly votes on?
Insert Answer Here
We could try and look at the short descriptions, short
, for each vote with the following code.
|>
roll_calls count(short)
Does the above information help us understand the voting issues of the UN over time? Explain.
Insert your answer here
Issues
Finally we have the issues
data which provides a more general description for each vote on specific issues. Note that not all issues are included in the data set, just the ones related to the 6 issues below:
|>
issues count(issue)
It would be helpful to use the issues
data with the roll_calls
data to be able to better understand the voting trends within the UN on at least these 6 issues. To do this, we need to join the data.
Votes Over Time
Now let’s join our data together to get a better idea of how the UN has voted over time.
First, look at the number of rows in issues
and roll_calls
- do they match? What does this indicate?
dim(issues)
dim(roll_calls)
Insert Answer Here
Now let’s try joining the roll_calls
with the issues
data. Compare the following codes to join the data. Describe what each one does and how it differs from the others as a comment in the code chunk. You might need to reference the slides or reading for this week.
|>
roll_calls left_join(issues, by = "rcid") #description of join
|>
roll_calls right_join(issues, by = "rcid") #description of join
|>
roll_calls full_join(issues, by = "rcid") #description of join
|>
roll_calls inner_join(issues, by = "rcid") #description of join
If we are only interested in retaining the records associated with the issues
labeled in our data, which join(s) should we use?
Insert answer here
Now that we know how to join the data, we will use the following code to examine the the voting trends for three of the issues related to conflict/weapons.
Be sure to run the code via the green arrow on the code chunk, as the case_when()
code can get finicky sometimes and claim an error about a comma in the code when it doesn’t exist. Comment on the code where indicated and add your chosen join function
|>
roll_calls _________(issues, by = "rcid") |> #join roll_calls and issues so that just the votes related to the issues data are retained.
mutate(issue = case_when(
== "Arms control and disarmament" ~ "Arms Control",
issue == "Nuclear weapons and nuclear material" ~ "Nuclear Weapons",
issue == "Palestinian conflict" ~ "Palestinian Conflict")) |> #what is happening in this mutate function?
issue drop_na(issue) |> #drop NAs for values not defined in the previous line of code
mutate(year = lubridate::year(date)) |> #create a column `year` that contains the year value
count(year, issue) |> #what does this line do?
ggplot(aes(x = year, y = n, group = issue)) + #what does this line do?
geom_line(aes(color = issue)) + #what does this line do?
labs(x = "Year", y = "Number of Votes",
title = "United Nations Votes per Year",
subtitle = "Conflict and Arms Related Votes",
color = "Voting Issue") +
theme_minimal()
What do you notice? What do you wonder based on the graph created? > Insert Answer Here
Challenge 1: Joining all Data
We want to try and create the following visualization
To do this, though, we need information from all three data set, unvotes
, issues
, and roll_calls
We want to join all three data sets together, maintaining only the votes for which we have identified the general issue (e.g., Nuclear War, Arms, Economics, etc.). We will save (assign) the data as un_full
. Fill in the best join functions to do maintain just the votes for which we have identified the general issue.
<- unvotes |>
un_full _________(issues, by = "rcid") |>
_________(roll_calls, by = "rcid")
#View(un_full)
Now, we are going to do some data cleaning. Our goal is create a data set that includes the percentage of “yes” votes per country each year. Fill in the missing _________ with the correct verbs. We will call the data table yes_votes
<- un_full |>
yes_votes _________(country, issue, date, vote) |> #keep just the columns for country, issue, date, vote
_________(year = lubridate::year(date)) |> #create a new variable called year
_________(country, year, issue) |> #group country, year, issue together to prepare for subsequent analysis
_________(percent_yes = mean(vote == "yes")) |> #calculate the proportion of yes votes
_________(issue = case_when(
== "Arms control and disarmament" ~ "Arms Control",
issue == "Nuclear weapons and nuclear material" ~ "Nuclear Weapons",
issue == "Palestinian conflict" ~ "Palestinian Conflict",
issue TRUE ~ issue)) #create/overwrite issue column with simpler values
#View(yes_votes) #you can use this code to look at the table created
Now write the clean data yes_votes
to a new data set in the data folder so we don’t have to rerun the above code again if we want to use the data in the future.
_____csv(yes_votes, "_____/yes_votes.csv")
Now we can feed the yes_votes
transformed data table into your graphing code, but first we will want to focus on the United States and Canada. Provide a comment to describe what each line of code is doing in the process.
|>
yes_votes filter(country %in% c("United States","Canada")) |> # your comment here
ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + # your comment here
geom_point(alpha = 0.4) + # your comment here
#geom_line(aes(group = country)) +
geom_smooth(method = "loess", se = FALSE) + #this fits a special model called a loess regression, a smooth line that fits the data
facet_wrap(~issue) + # your comment here
scale_y_continuous(labels = scales::percent) + #your comment here
labs(
title = "Percentage of 'Yes' votes in the UN General Assembly",
subtitle = "1946 to 2015",
y = "% Yes",
x = "Year",
color = "Country"
+ #your comment here
) theme_bw() +
theme(legend.position = "bottom") + #your comment here
scale_color_viridis_d(option = "turbo") #your comment here
What do you notice about the voting records over time?
Insert Answer Here
Challenge 2: Adding more data
After your instructor created the above plot, she became curious about how politics might impact the UN Voting record for the United States since UN Ambassador is a presidential appointment. So your instructor started searching for a data set of US presidents, their years in office, and their political affiliation. She found a data set on Kaggle.com and removed the information prior to 1940 (because the dates were coded funny and it was causing problems). She saved the data as us_presidents.csv
and imported it into the project.
<- read_csv("data/us_presidents.csv") president
She realized that her data only had the start/end dates for each president and she wanted a data set that filled in the missing years and political party for the president in that time period. After much googling and reading Stack Overflow, found two functions she did not know about called complete()
and fill()
to fill in the missing years and party affiliations
<- president |>
politics_year mutate(start = lubridate::mdy(start), #formats date correctly
start_year = lubridate::year(start)) |> #pulls out year
filter(start_year > 1940) |> #removes data before 1940 since there was no UN
select(start_year, party) |> #pulls out just the variables of interest
complete(start_year = seq(min(start_year), 2020, by = 1)) |> #fill in missing years
fill(party) #fill in missing party affiliations for years
Next, your instructor took the yes_votes
data and filtered out just the United States data and then joined by year to add the party affiliation of the president for each year of UN votes. To create the visualization with the smoothed model, but color by party affiliation, she had to add a new column called predict
that fit the model first instead of using geom_smooth()
.
Comment the code below:
|>
yes_votes filter(country == "United States") |> #comment
left_join(politics_year, by = c("year" = "start_year")) |> #comment
group_by(issue) |> #comment
mutate(predict = predict(loess(percent_yes~year))) |> #comment
ggplot(mapping = aes(x = year, color = party, group = 1)) + #comment
geom_point(aes(y = percent_yes),alpha = 0.4) + #comment
geom_line(aes(y = predict)) + #comment
facet_wrap(~issue) + #comment
scale_y_continuous(labels = scales::percent) + #comment
labs(
title = "Percentage of 'Yes' votes in the UN General Assembly",
subtitle = "1946 to 2015",
y = "% Yes",
x = "Year",
color = "Country"
+ #comment
) theme_bw() + #comment
scale_color_manual(values = c("blue", "red")) + #comment
theme(legend.position = "bottom") #comment
What do you notice? What do you wonder?
Insert your answer