STAA 566 Dynamic Plot

Author

Tiana Stastny

Obtain data on the 2004 tsunami that impacted Thailand

On December 26, 2004, an undersea earthquake off the coast of northern Indonesia caused a series of tsunami waves that impacted several countries in this region, including Thailand. The National Oceanic and Atmospheric Administration (NOAA) provides data on this tsunami, among others, under the National Centers for Environmental Information (NCEI), specifically the Natural Hazards subdivision. This data is publicly available here:

NOAA tsunami database

The data plotted below is found by querying the database for the December 26th event in 2004. To replicate this, enter the min and max year as ‘2004’ and click ‘Search’:

Search the NOAA tsunami database

Then, locate the December 26th event, which has 1715 “runup” events. NOAA defines a “runup” as the maximum height of the water measured above a reference sea level.

2004 search results

Click on the ‘1715’ runups, and the following data will be available to download via the button in the top left corner of the table:

Download data specific to Dec. 26, 2004 tsunami

Load libraries


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Warning: package 'plotly' was built under R version 4.1.3

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
Warning: package 'viridis' was built under R version 4.1.3
Loading required package: viridisLite
Warning: package 'htmlwidgets' was built under R version 4.1.3

Load data

Load the data from the tsv (tab separated values) file.

# read in data
tsunami <- read.table(file= 'tsunami_2004_noaa_ncei.tsv', sep = '\t', header = TRUE)
# look at data
head(tsunami)
  Search.Parameters More.Info Doubtful.Runup     Country Area
1 Tsunami ID = 2439        NA                                
2                          NA              n   INDONESIA ACEH
3                          NA              n   INDONESIA ACEH
4                          NA              n   INDONESIA ACEH
5                          NA              n   INDONESIA ACEH
6                          NA              n NEW ZEALAND     
                Location.Name Latitude Longitude Distance.From.Source..km.
1                                   NA        NA                        NA
2 WEST COAST OF ACEH, SUMATRA   5.4400   95.2410                   244.530
3 WEST COAST OF ACEH, SUMATRA   5.4780   95.2460                   248.417
4 WEST COAST OF ACEH, SUMATRA   5.4520   95.2420                   245.774
5 WEST COAST OF ACEH, SUMATRA   5.4520   95.2430                   245.743
6                 JACKSON BAY -43.9733  168.6161                  8898.402
  Initial.Wave.Arrival.Dy Initial.Wave.Arrival.Hr Initial.Wave.Arrival.Min
1                      NA                      NA                       NA
2                      NA                      NA                       NA
3                      NA                      NA                       NA
4                      NA                      NA                       NA
5                      NA                      NA                       NA
6                      26                      19                       17
  Travel.Hours Travel.Minutes Max.Wave.Arrival.Day Max.Wave.Arrival.Hr
1           NA             NA                   NA                  NA
2           NA             NA                   NA                  NA
3           NA             NA                   NA                  NA
4           NA             NA                   NA                  NA
5           NA             NA                   NA                  NA
6           18             18                   NA                  NA
  Max.Wave.Arrival.Min Max.Water.Height..m. Max.Inundation.Distance..m.
1                   NA                   NA                          NA
2                   NA                29.98                          NA
3                   NA                12.39                          NA
4                   NA                30.40                          NA
5                   NA                15.77                          NA
6                   NA                 0.91                          NA
  Measurement.Type Period First.Motion Deaths Death.Description Missing
1               NA     NA                  NA                NA      NA
2                5     NA                  NA                NA      NA
3                4     NA                  NA                NA      NA
4                4     NA                  NA                NA      NA
5                4     NA                  NA                NA      NA
6                2     26                  NA                NA      NA
  Missing.Description Injuries Injuries.Description Damage..Mil
1                  NA       NA                   NA          NA
2                  NA       NA                   NA          NA
3                  NA       NA                   NA          NA
4                  NA       NA                   NA          NA
5                  NA       NA                   NA          NA
6                  NA       NA                   NA          NA
  Damage.Description Houses.Destroyed Houses.Destroyed.Description
1                 NA               NA                           NA
2                 NA               NA                           NA
3                 NA               NA                           NA
4                 NA               NA                           NA
5                 NA               NA                           NA
6                 NA               NA                           NA
  Houses.Damaged Houses.Damaged.Description
1             NA                         NA
2             NA                         NA
3             NA                         NA
4             NA                         NA
5             NA                         NA
6             NA                         NA
# look at datatypes, dimension of data, column names
class(tsunami)
[1] "data.frame"
dim(tsunami)
[1] 1716   34
colnames(tsunami)
 [1] "Search.Parameters"            "More.Info"                   
 [3] "Doubtful.Runup"               "Country"                     
 [5] "Area"                         "Location.Name"               
 [7] "Latitude"                     "Longitude"                   
 [9] "Distance.From.Source..km."    "Initial.Wave.Arrival.Dy"     
[11] "Initial.Wave.Arrival.Hr"      "Initial.Wave.Arrival.Min"    
[13] "Travel.Hours"                 "Travel.Minutes"              
[15] "Max.Wave.Arrival.Day"         "Max.Wave.Arrival.Hr"         
[17] "Max.Wave.Arrival.Min"         "Max.Water.Height..m."        
[19] "Max.Inundation.Distance..m."  "Measurement.Type"            
[21] "Period"                       "First.Motion"                
[23] "Deaths"                       "Death.Description"           
[25] "Missing"                      "Missing.Description"         
[27] "Injuries"                     "Injuries.Description"        
[29] "Damage..Mil"                  "Damage.Description"          
[31] "Houses.Destroyed"             "Houses.Destroyed.Description"
[33] "Houses.Damaged"               "Houses.Damaged.Description"  
# take subset of data to use in plot
# want: "Country", "Location.Name", "Distance.From.Source..km.", "Max.Water.Height..m."
tsunami_data <- tsunami[, c(4, 6, 9, 18)]
# get data for Thailand only
thailand <- tsunami_data %>% filter(Country=="THAILAND") 

# review data
head(thailand)
   Country      Location.Name Distance.From.Source..km. Max.Water.Height..m.
1 THAILAND           NAM KHEM                   667.836                 7.50
2 THAILAND           NAM KHEM                   667.914                 8.40
3 THAILAND         THAI MUANG                   622.262                 6.10
4 THAILAND         BAN NOK NA                   682.389                12.60
5 THAILAND        BAN PAK NAM                   793.878                 1.20
6 THAILAND MAKHAM BAY, PHUKET                   578.109                 1.39

Note that the original tsv file used in this analysis is included in the GitHub repo.

Goal of this analysis

The goal of this analysis is to create a plot that shows the measurements of maximum water height (m) at various locations around Thailand. This will illustrate which regions of Thailand were impacted by the 2004 Indian Ocean tsunami, and characterize the range of heights of tsunami waves observed in this region. The plot will convey the maximum ‘max_water_height’ per each Thailand location, and also convey how far the location was from the source of the tsunami (the undersea earthquake).

Tidy data

One thing to consider is there are multiple readings for same location. For our purposes, we are interested in the maximum value at each location.

# look at Location.Name
names <- ggplot(data.frame(thailand), aes(x=Location.Name)) + 
  geom_bar() +
  theme(axis.text.x = element_text(angle=90, hjust=1))
  
names

# look at unique Location.Name values
unique(thailand[["Location.Name"]])
 [1] "NAM KHEM"                           "THAI MUANG"                        
 [3] "BAN NOK NA"                         "BAN PAK NAM"                       
 [5] "MAKHAM BAY, PHUKET"                 "KHAO LAK"                          
 [7] "BAN NIANG BEACH"                    "BAN TAM NANG"                      
 [9] "TA POU NOI"                         "BAN BANG PHNG"                     
[11] "BAN THUNG DAP"                      "KARON BEACH (CENTRAL PART), PHUKET"
[13] "LEAM HIM, PHUKET"                   "BANG RONG PIER, PHUKET"            
[15] "PATONG BEACH, PHUKET"               "NAI YANG BEACH, PHUKET"            
[17] "KATA NOI BEACH, PHUKET"             "RAWAI BEACH, PHUKET"               
[19] "TRANG"                              "KO KOH KAO PORT"                   
[21] "BAN PAK NAM FISHERING PORT"         "KA YU HARBOR (BAN LA ONG)"         
[23] "RAMSON"                             "KURABURI"                          
[25] "TARUTAO"                            "MOODONG CANAL, PHUKET"             
[27] "PHALAI VILLAGE, PHUKET"             "CHALONG BAY PIER, PHUKET"          
[29] "PHI PHI DON (SOUTH COAST)"          "TAP LAM NAVY BASE"                 
[31] "BAN AO LUK TUM"                     "KRABI"                             
[33] "BAN KO DAM"                         "PHUKET"                            
[35] "KARON BEACH (SOUTH PART), PHUKET"   "LAEM PAKARANG"                     
[37] "BAN THUNG WA"                       "THAI MUANG, VISITOR CENTER"        
[39] "KO YAO, FISHING VILLAGE"            "BAN AO KHOEI"                      
[41] "BAN NAM KIM"                        "FRIENDSHIP BEACH HOTEL, PHUKET"    
[43] "BAN KAO LAK"                        "CHALONG"                           
[45] "BAN PAK CHOK"                       "BAN THALE NOK"                     
[47] "NAI RAI"                            "SATUN"                             
[49] "BAN NA TAI"                         "BAN NAM KEM"                       
[51] "RAI DAN"                            "THAI MUANG, NAT. CONSERVATION PARK"
[53] "HAD SAI DAM (BAN LA ONG)"           "HAT PRAPHAT"                       
[55] "KAMALA BEACH, PHUKET"               "BANG THAO BEACH, PHUKET"           
[57] "BAN CHANG HAK"                      "ALL OF THAILAND"                   
[59] "BAN PAK NAM PORT"                   "SIRE VILLAGE, PHUKET"              
[61] "PHI PHI DON (NORTH COAST)"          "BAN MA KAP"                        
[63] "BAN PAK KO"                         "RANONG"                            

There is one Location.Name value of “All of Thailand”. Since this is vague, drop it from the dataframe.

thailand <- thailand %>% filter(Location.Name != 'ALL OF THAILAND')

Rename the columns to tidier names.

thailand <- thailand %>% rename(
  Location = Location.Name,
  Distance_From_Source = Distance.From.Source..km.,
  Max_Water_Height = Max.Water.Height..m.
)
head(thailand)
   Country           Location Distance_From_Source Max_Water_Height
1 THAILAND           NAM KHEM              667.836             7.50
2 THAILAND           NAM KHEM              667.914             8.40
3 THAILAND         THAI MUANG              622.262             6.10
4 THAILAND         BAN NOK NA              682.389            12.60
5 THAILAND        BAN PAK NAM              793.878             1.20
6 THAILAND MAKHAM BAY, PHUKET              578.109             1.39

We want just one measurement of maximum water height (meters) per location. For our plot, we will look at the maximum measurement at each location.

thailand <- thailand %>% group_by(Location) %>% slice(which.max(Max_Water_Height))
dim(thailand)
[1] 63  4

We should also check for NA values.

colSums(is.na(thailand))
             Country             Location Distance_From_Source 
                   0                    0                    0 
    Max_Water_Height 
                   0 

There are no NA values in this subset of data.

Sort data according to maximum water height to increase readability of the plot.

thailand$Location <- factor(thailand$Location, 
                                    levels = thailand$Location[order(thailand$Max_Water_Height)])

Build dynamic plot using plotly

Using the plotly library will allow the plot of maximum water height per Thailand location to be interactive. First, the viewer will be able to hover over each bar and read the specific location, water height, and distance from source. There are 63 bars, so hovering over for specific details will improve readability. The colors of the bars will denote distance from the source, which will provide the viewer with additional information (rather than maximum water height only). Environmental scientists may utilize a similar plot to explore whether there is an association between distance from the undersea earthquake and maximum height of tsunami waves observed. Finally, the highlight feature will allow a user to hover over a specific bar and focus on this specific bar only to view its measurements directly.

# create highlight to specifically look at a measurement for a certain distance value
thailand_highlight <- highlight_key(thailand, ~Distance_From_Source)

# build ggplot object
thailand_plot <- ggplot(data = thailand_highlight,
                        mapping = aes(x = Location,
                                      y = Max_Water_Height,
                                      fill = Distance_From_Source)
                        ) +
  geom_bar(stat="identity", width=0.5) +
  theme_classic() +
  theme(axis.text.x = element_text(angle=45, hjust=1, size = 6)) +
  xlab("Location") +
  ylab("Maximum height of water (m)") +
  labs(fill = "Distance from source (km)") +
  scale_fill_viridis() +
  ggtitle("Maximum water height per Thailand location during 2004 Indian Ocean tsunami")

# convert to plotly object with specific highlight attributes
thailand_plotly <- ggplotly(thailand_plot) %>% 
  highlight(on = "plotly_hover", off = "plotly_relayout", color = "black") 

# show plot
thailand_plotly
# save plot
saveWidget(as_widget(thailand_plotly), "thailand_water_height_2004.html")
save(thailand_plotly, file="thailand_water_height_2004.rda")