I wish to explore secondary student responses to the question: Overall, how do you feel about your life? Responses were on a 5 point likert scale, which consisted of 5 emojis ranging from very sad to very happy.
Data is provided by and property of Youth Truth Student Survey, a national nonprofit, and may only be shared in aggregate for the confidentiality of students and clients.
Our sample consisted of 161,340 secondary students (Grades 6-12) in the 2021-22 school year across 19 states, and 442 schools.
Schools that choose to work with Youth Truth, and opt in to the Emotional and Mental health additional topic administered the question to students.
Loading and prepping data
HS<-read.csv("/Users/valerier/Dropbox (CEP)/YouthTruth/Data and Research/EMH Back to School 2022/R Script and Results/HS/HS_dataclean_2022.csv")
MS<-read.csv("/Users/valerier/Dropbox (CEP)/YouthTruth/Data and Research/EMH Back to School 2022/R Script and Results/MS/MS_dataclean_2022.csv")
HS_subset<- HS[ ,c("em_life","gender", "racen")]
MS_subset<-MS[ ,c("m_em_life","m_gender", "m_racen")]
colnames(MS_subset)<-c("em_life","gender", "racen")
Secondary<-rbind(HS_subset,MS_subset)
Secondary<-na.omit(Secondary)
summary(Secondary)
## em_life gender racen
## Min. :1.000 Min. : 1.00 Min. : 1.000
## 1st Qu.:3.000 1st Qu.: 1.00 1st Qu.: 1.000
## Median :4.000 Median : 2.00 Median : 2.000
## Mean :3.608 Mean :12.33 Mean : 7.103
## 3rd Qu.:4.000 3rd Qu.: 2.00 3rd Qu.: 5.000
## Max. :5.000 Max. :99.00 Max. :99.000
Loading various libraries (as I was testing I lost track of which I used and chose not to use, so I kept them all in!)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggthemes)
library(viridis)
## Loading required package: viridisLite
library(viridisLite)
library(distributional)
library(ggdist)
library(patchwork)
Creating Graph
This is the first iteration, a stacked bar graph comparing responses by gender. It gives a good impression of the response distribution by gender
#Manually add our YT colors for our pallette
YTPalette<-c("#0fb2cb","#288fbd", "#b0c5cc", "#efe15f","#f99c25")
#Filtering out no answer and 'skip this question' responses to the gender question as they do not provide useful data
genderplot<-Secondary %>% filter(gender != 77 & gender !=99) %>%
ggplot() +
#setting up bar graphs with likert data as the fill, separated by gender
#lyt=blank removes the outlines from the bars, which I found distracting
#Likert and gender data, must be entered as factor
geom_bar(aes(x = gender, fill = forcats::fct_rev(factor(em_life))), position = 'fill', lty="blank")+
#applying custom colors and labels
scale_fill_manual(values= YTPalette,labels = rev(c("very negative", "negative","neutral","positive", "very positive")),name="")+
ylab('Proportion')+
scale_y_continuous(labels = scales::percent)+
xlab('Gender')+
#renaming what were numerically categorical variables
scale_x_discrete(limit = c("Male", "Female", "Non-binary"))+
#provided a descriptive title
labs(title="Overall, how do you feel about your life?", subtitle = "Responses from secondary students (2021-22, n=160,672)")+
#flipping coordinates to make a stacked bar
coord_flip()+
#removing background
theme(panel.background = element_blank())
genderplot