STAA566_Schaefer_hw4

Author

Paul Schaefer

# Inport libraries
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(knitr)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(reactable)

Data import and cleaning

gas <- read_csv("C:\\Users\\pscha\\Documents\\_School\\2022_Fall_Classes\\STAA_566_DataViz\\natgas.csv")
gas <- gas[complete.cases(gas[ , 8]),] # get rid on NAs in Capacity
cap <- sum(gas$Capacity)
rank <- c(1:nrow(gas))
gas <- gas %>%
  mutate(percent=Capacity/cap) #calculate percentage of total
gas <- gas[order(-gas$percent),]
gas <- gas %>%
  mutate(cum_per=cumsum(percent),rank = rank)%>%# calculate cumulative percentage
  mutate(percentile = case_when(cum_per<=.05 ~ '5th',
                                cum_per<=.1 & cum_per > .05 ~ '10th',
                                cum_per<=.2 & cum_per > .1 ~ '20th',
                                cum_per > .2 ~ 'higher'))#%>%
  #mutate(display = case_when(cum_per<=.05 ~ cell_spec,
  #                              cum_per<=.1 & cum_per > .05 ~ '10',
  #                              cum_per<=.2 & cum_per > .1 ~ '20',
  #                              cum_per > .2 ~ 'higher')))

head(gas)
# A tibble: 6 × 16
   Year State Name    Owner Opera…¹ County Zipcode Capac…²  Flow BTU_C…³ Dry_S…⁴
  <dbl> <chr> <chr>   <chr> <chr>   <chr>  <chr>     <dbl> <dbl>   <dbl>   <dbl>
1  2017 IL    Aux Sa… Enbr… <NA>    Grundy 60450      2100  1633    1131      NA
2  2017 CO    Meeker… Ente… Enterp… Rio B… 81650      1800   955    1080      NA
3  2017 WV    Sherwo… Mark… MarkWe… Doddr… 26456      1800  1481    1225      NA
4  2017 LA    North … Ente… Enterp… Terre… 70356      1100   650    1100      NA
5  2017 LA    Toca G… Ente… Enterp… St. B… 70085      1100   150    1096      NA
6  2017 WY    Opal G… Will… Willia… Linco… 83124      1100   660      NA      NA
# … with 5 more variables: Liquid_Storage <dbl>, percent <dbl>, cum_per <dbl>,
#   rank <int>, percentile <chr>, and abbreviated variable names ¹​Operator,
#   ²​Capacity, ³​BTU_Content, ⁴​Dry_Storage

Make the table

gas_table2 <- gas %>%
  select(rank,Name, Owner, State, Capacity,percentile) %>%
  reactable(
    defaultColDef = colDef(align = 'center'),
    columns = list(
      rank = colDef(name = "Rank", align = 'right'),
      Name = colDef(name = "Name", align = 'left'),
      State = colDef(name = "State", align = 'left'),
      Owner = colDef(name = "Owner", align = 'left'),
      Capacity = colDef(name = "Capacity (M cu. ft. per day)", align = 'right'),
      percentile = colDef(name = 'Percentile'), align = 'right'),
    style = list(fontSize = "1rem"))
gas_table2

This table is based on a work project a few years back when we attempted to determine whether there were parts of the US energy infrastructure that contributed a ‘dispoportionate’ amount to the overall capacity. We wanted to know if there were a ‘relatively small’ number of production facilities that produced a ‘relatively large’ percentage of different outputs. That project involved Excel, which meant the tables were dreadful, so we ended up putting the results on a map instead. This is an attempt to produce a table that gives the same information.

I limited the coulmns to the ones that we would have used for the project: the name, owner, capacity, and the percentile that the plant fell in. Due to the size of the data, I wanted tabs, so I used the reactable() package for the table. I had some code problems with the percentiles - there are some plants with the same capacity, so they assigned percentiles by the order they were originally in, but sorting on ‘Capacity’ will slightly mis-order them.

The data source is a .csv downloaded from the US Energy Information Agency website: https://www.eia.gov/naturalgas/ngqs/#?report=RP9&year1=2017&year2=2017&company=Name The data is based on 2017 reporting and was released in 2019.