library(dplyr)
library(ggplot2)
library(gghighlight)
GGPlot Assignment
Description
I downloaded corn yield data from USDA’s quick stats lite site so that I could compare corn yields for irrigated land over time from a couple of states of interest: Montana (where I will be moving to) and Colorado (because of CSU).
To replicate the data one can visit the site linked above and apply the filters described below:
- Sector = CROPS
- Group = FIELD CROPS
- Commodity = CORN
- View = Acreage, Yield, and Production - Irrigated / Non-Irrigated
- Year = 1950-2022
- Geographic Level = State
Note: I exported data for all states, but had to do it in two exports. The tool provided an empty CSV when I attempted to select data for all states at once. This data is also available in this GitHub repo.
Load Data
# Read data from a csv file
# data <- read.csv('corn_production.csv')
<- read.csv('corn_production1.csv')
data1 <- read.csv('corn_production2.csv')
data2 <- bind_rows(data1, data2)
data
# Change the names to lower case so that they're easier for me to work with
names(data) <- tolower(names(data))
Data preparation
Here I will rename, select a subset of columns, and filter the data to records that have data for production as well as those where the corn was grown on irrigated land.
# Rename, select a subset of columns, and filter
<- data %>%
data rename(
production=production.in.bu,
harvested_area=area.harvested.in.acres,
yield=yield.in.bu...acre
%>%
) filter(
> 0,
production == 'IRRIGATED'
prodn.practice %>%
) select(year, location, prodn.practice, harvested_area, production, yield)
Plotting
ggplot(data, aes(x=year, y=yield)) +
theme_minimal() +
geom_line(aes(group=location), color='red') +
gghighlight(location %in% c('MONTANA', 'COLORADO')) +
labs(
title='USDA Corn Yield Data',
subtitle='Comparison of Montana and Colorado corn yields',
x='Year',
y='Yield (bu/acre)'
)