Obtain data on the 2004 tsunami that impacted Thailand
On December 26, 2004, an undersea earthquake off the coast of northern Indonesia caused a series of tsunami waves that impacted several countries in this region, including Thailand. The National Oceanic and Atmospheric Administration (NOAA) provides data on this tsunami, among others, under the National Centers for Environmental Information (NCEI), specifically the Natural Hazards subdivision. This data is publicly available here:
The data plotted below is found by querying the database for the December 26th event in 2004. To replicate this, enter the min and max year as ‘2004’ and click ‘Search’:
Search the NOAA tsunami database
Then, locate the December 26th event, which has 1715 “runup” events. NOAA defines a “runup” as the maximum height of the water measured above a reference sea level.
2004 search results
Click on the ‘1715’ runups, and the following data will be available to download via the button in the top left corner of the table:
Download data specific to Dec. 26, 2004 tsunami
Load libraries
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Warning: package 'plotly' was built under R version 4.1.3
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Warning: package 'viridis' was built under R version 4.1.3
Loading required package: viridisLite
Warning: package 'htmlwidgets' was built under R version 4.1.3
Load data
Load the data from the tsv (tab separated values) file.
# read in datatsunami <-read.table(file='tsunami_2004_noaa_ncei.tsv', sep ='\t', header =TRUE)
# look at datahead(tsunami)
Search.Parameters More.Info Doubtful.Runup Country Area
1 Tsunami ID = 2439 NA
2 NA n INDONESIA ACEH
3 NA n INDONESIA ACEH
4 NA n INDONESIA ACEH
5 NA n INDONESIA ACEH
6 NA n NEW ZEALAND
Location.Name Latitude Longitude Distance.From.Source..km.
1 NA NA NA
2 WEST COAST OF ACEH, SUMATRA 5.4400 95.2410 244.530
3 WEST COAST OF ACEH, SUMATRA 5.4780 95.2460 248.417
4 WEST COAST OF ACEH, SUMATRA 5.4520 95.2420 245.774
5 WEST COAST OF ACEH, SUMATRA 5.4520 95.2430 245.743
6 JACKSON BAY -43.9733 168.6161 8898.402
Initial.Wave.Arrival.Dy Initial.Wave.Arrival.Hr Initial.Wave.Arrival.Min
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 26 19 17
Travel.Hours Travel.Minutes Max.Wave.Arrival.Day Max.Wave.Arrival.Hr
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 18 18 NA NA
Max.Wave.Arrival.Min Max.Water.Height..m. Max.Inundation.Distance..m.
1 NA NA NA
2 NA 29.98 NA
3 NA 12.39 NA
4 NA 30.40 NA
5 NA 15.77 NA
6 NA 0.91 NA
Measurement.Type Period First.Motion Deaths Death.Description Missing
1 NA NA NA NA NA
2 5 NA NA NA NA
3 4 NA NA NA NA
4 4 NA NA NA NA
5 4 NA NA NA NA
6 2 26 NA NA NA
Missing.Description Injuries Injuries.Description Damage..Mil
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
Damage.Description Houses.Destroyed Houses.Destroyed.Description
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
Houses.Damaged Houses.Damaged.Description
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
# look at datatypes, dimension of data, column namesclass(tsunami)
# take subset of data to use in plot# want: "Country", "Location.Name", "Distance.From.Source..km.", "Max.Water.Height..m."tsunami_data <- tsunami[, c(4, 6, 9, 18)]
# get data for Thailand onlythailand <- tsunami_data %>%filter(Country=="THAILAND") # review datahead(thailand)
Country Location.Name Distance.From.Source..km. Max.Water.Height..m.
1 THAILAND NAM KHEM 667.836 7.50
2 THAILAND NAM KHEM 667.914 8.40
3 THAILAND THAI MUANG 622.262 6.10
4 THAILAND BAN NOK NA 682.389 12.60
5 THAILAND BAN PAK NAM 793.878 1.20
6 THAILAND MAKHAM BAY, PHUKET 578.109 1.39
Note that the original tsv file used in this analysis is included in the GitHub repo.
Goal of this analysis
The goal of this analysis is to create a plot that shows the measurements of maximum water height (m) at various locations around Thailand. This will illustrate which regions of Thailand were impacted by the 2004 Indian Ocean tsunami, and characterize the range of heights of tsunami waves observed in this region. The plot will convey the maximum ‘max_water_height’ per each Thailand location, and also convey how far the location was from the source of the tsunami (the undersea earthquake).
Tidy data
One thing to consider is there are multiple readings for same location. For our purposes, we are interested in the maximum value at each location.
# look at Location.Namenames <-ggplot(data.frame(thailand), aes(x=Location.Name)) +geom_bar() +theme(axis.text.x =element_text(angle=90, hjust=1))names
# look at unique Location.Name valuesunique(thailand[["Location.Name"]])
[1] "NAM KHEM" "THAI MUANG"
[3] "BAN NOK NA" "BAN PAK NAM"
[5] "MAKHAM BAY, PHUKET" "KHAO LAK"
[7] "BAN NIANG BEACH" "BAN TAM NANG"
[9] "TA POU NOI" "BAN BANG PHNG"
[11] "BAN THUNG DAP" "KARON BEACH (CENTRAL PART), PHUKET"
[13] "LEAM HIM, PHUKET" "BANG RONG PIER, PHUKET"
[15] "PATONG BEACH, PHUKET" "NAI YANG BEACH, PHUKET"
[17] "KATA NOI BEACH, PHUKET" "RAWAI BEACH, PHUKET"
[19] "TRANG" "KO KOH KAO PORT"
[21] "BAN PAK NAM FISHERING PORT" "KA YU HARBOR (BAN LA ONG)"
[23] "RAMSON" "KURABURI"
[25] "TARUTAO" "MOODONG CANAL, PHUKET"
[27] "PHALAI VILLAGE, PHUKET" "CHALONG BAY PIER, PHUKET"
[29] "PHI PHI DON (SOUTH COAST)" "TAP LAM NAVY BASE"
[31] "BAN AO LUK TUM" "KRABI"
[33] "BAN KO DAM" "PHUKET"
[35] "KARON BEACH (SOUTH PART), PHUKET" "LAEM PAKARANG"
[37] "BAN THUNG WA" "THAI MUANG, VISITOR CENTER"
[39] "KO YAO, FISHING VILLAGE" "BAN AO KHOEI"
[41] "BAN NAM KIM" "FRIENDSHIP BEACH HOTEL, PHUKET"
[43] "BAN KAO LAK" "CHALONG"
[45] "BAN PAK CHOK" "BAN THALE NOK"
[47] "NAI RAI" "SATUN"
[49] "BAN NA TAI" "BAN NAM KEM"
[51] "RAI DAN" "THAI MUANG, NAT. CONSERVATION PARK"
[53] "HAD SAI DAM (BAN LA ONG)" "HAT PRAPHAT"
[55] "KAMALA BEACH, PHUKET" "BANG THAO BEACH, PHUKET"
[57] "BAN CHANG HAK" "ALL OF THAILAND"
[59] "BAN PAK NAM PORT" "SIRE VILLAGE, PHUKET"
[61] "PHI PHI DON (NORTH COAST)" "BAN MA KAP"
[63] "BAN PAK KO" "RANONG"
There is one Location.Name value of “All of Thailand”. Since this is vague, drop it from the dataframe.
thailand <- thailand %>%filter(Location.Name !='ALL OF THAILAND')
Country Location Distance_From_Source Max_Water_Height
1 THAILAND NAM KHEM 667.836 7.50
2 THAILAND NAM KHEM 667.914 8.40
3 THAILAND THAI MUANG 622.262 6.10
4 THAILAND BAN NOK NA 682.389 12.60
5 THAILAND BAN PAK NAM 793.878 1.20
6 THAILAND MAKHAM BAY, PHUKET 578.109 1.39
We want just one measurement of maximum water height (meters) per location. For our plot, we will look at the maximum measurement at each location.
Using the plotly library will allow the plot of maximum water height per Thailand location to be interactive. First, the viewer will be able to hover over each bar and read the specific location, water height, and distance from source. There are 63 bars, so hovering over for specific details will improve readability. The colors of the bars will denote distance from the source, which will provide the viewer with additional information (rather than maximum water height only). Environmental scientists may utilize a similar plot to explore whether there is an association between distance from the undersea earthquake and maximum height of tsunami waves observed. Finally, the highlight feature will allow a user to hover over a specific bar and focus on this specific bar only to view its measurements directly.
# create highlight to specifically look at a measurement for a certain distance valuethailand_highlight <-highlight_key(thailand, ~Distance_From_Source)# build ggplot objectthailand_plot <-ggplot(data = thailand_highlight,mapping =aes(x = Location,y = Max_Water_Height,fill = Distance_From_Source) ) +geom_bar(stat="identity", width=0.5) +theme_classic() +theme(axis.text.x =element_text(angle=45, hjust=1, size =6)) +xlab("Location") +ylab("Maximum height of water (m)") +labs(fill ="Distance from source (km)") +scale_fill_viridis() +ggtitle("Maximum water height per Thailand location during 2004 Indian Ocean tsunami")# convert to plotly object with specific highlight attributesthailand_plotly <-ggplotly(thailand_plot) %>%highlight(on ="plotly_hover", off ="plotly_relayout", color ="black") # show plotthailand_plotly
# save plotsaveWidget(as_widget(thailand_plotly), "thailand_water_height_2004.html")save(thailand_plotly, file="thailand_water_height_2004.rda")