GGPLOT2 Assignment

This data set contains ratings of every episode from the IMDB top 250 series. Personally, I have heard the “Parks and Rec vs The Office” debate for years now. I would like to see how their IMDB ratings stack up against each other over the course of their series’.

ratings <- read.csv("C:/Users/ericp/Downloads/archive/imdb_top_250_series_episode_ratings.csv")
favShows <- c("Parks and Recreation", "The Office")
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
recVsOffice <- ratings %>% 
  filter(Title %in% favShows)
head(recVsOffice)
##   X Season Episode Rating      Code      Title
## 1 0      1       1    7.3 tt0386676 The Office
## 2 1      1       2    8.1 tt0386676 The Office
## 3 2      1       3    7.6 tt0386676 The Office
## 4 3      1       4    7.9 tt0386676 The Office
## 5 4      1       5    8.3 tt0386676 The Office
## 6 5      1       6    7.6 tt0386676 The Office

First, the data needed to be filtered to include only these two shows.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
library(dplyr)
library(directlabels)
## Warning: package 'directlabels' was built under R version 4.0.5
plt <- ggplot(data = recVsOffice) + 
  geom_line(aes(x=X, y=Rating, color = Title))  + geom_dl(aes(x=X, y=Rating, color = Title,label=Title), method = 'top.points') + xlab("Episode Number")
plt

But what about by season?

(recVsOfficeSeasons <- recVsOffice %>% group_by(Season,Title) %>% summarise(Rating = mean(Rating), .groups = "keep"))
## # A tibble: 16 x 3
## # Groups:   Season, Title [16]
##    Season Title                Rating
##     <int> <chr>                 <dbl>
##  1      1 Parks and Recreation   7.17
##  2      1 The Office             7.97
##  3      2 Parks and Recreation   8.02
##  4      2 The Office             8.33
##  5      3 Parks and Recreation   8.41
##  6      3 The Office             8.50
##  7      4 Parks and Recreation   8.21
##  8      4 The Office             8.41
##  9      5 Parks and Recreation   8.10
## 10      5 The Office             8.37
## 11      6 Parks and Recreation   8.01
## 12      6 The Office             8.07
## 13      7 Parks and Recreation   8.3 
## 14      7 The Office             8.18
## 15      8 The Office             7.43
## 16      9 The Office             7.73
plt <- ggplot(data = recVsOfficeSeasons) + 
  geom_line(aes(x=Season, y=Rating, color = Title))  + geom_dl(aes(x=Season, y=Rating, color = Title,label=Title), method = 'top.points') + xlab("Season")
plt

Finally, let’s look at each series’ distribution of ratings.

plt <- ggplot(data = recVsOffice) + 
  geom_bar(stat = "count", aes(x=Rating, fill = Title), position = "dodge") + xlab("Rating")
plt