Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

Daintree Rainforest Observatory Climate Data from AWAP-GRIDS

AWAP GRIDS

1 Introduction and Methods

This is a work in progress. It is a stub of an article I want to put together which shows how to use several online data repositories together as a showcase of the [Scientific Workflow and Integration Software for Health (SWISH) Climate Impact Assessments](https://github.com/swish-climate-impact-assessment) project.

1.1 Authors

1.2 Background

  • Markus Nolf offers this use case of the [SWISH EWEDB](http://swish-climate-impact-assessment.github.io/)
  • Markus is pulling together his Daintree Rainforest Observatory (DRO) data into a manuscript for publication, and was looking for climate data from 2012 as well as long-term.
  • More specifically, the annual precipitation and mean annual temperature for both 2012 and the 30-year mean.
  • The Australian Bureau of Meteorology has a nice rainfall dataset available at http://www.bom.gov.au/climate/data/ ("Cape Trib Store" weather station), but it seems like the temperature records are patchy.
  • So it is advised to use the data the DRO collects its self
  • You need to apply through the [ASN SuperSite data portal](http://www.tern-supersites.net.au/knb/) for access to the daily data for the DRO.
  • Note the use of the DRO met data will need to be properly cited as it is harder to keep

an AWS station running in the tropics for years than it is to collect most other data. The citation information is provided when you make a request to access the data.

  • The long term mean used by most DRO researchers is from the BOM station as we only have a short record from the station itself. The offset is around 1000mm.
  • However what we want is mean annual temperatures but the BOM website seems to focus more on mean minimum and maximum temperatures.

1.3 Material and Methods

  • Extract mean annual temperatures at the BOM website
    • SWISH uses BoM data a fair bit and aims to streamline access to BoM data for extreme weather event analysis (which require long term average climatology to provide the baseline that extremes are measured against).
    • WRT to temperature most daily averages from BoM are calculated as average of maximumtemperaturein24hoursafter9amlocaltimeindegrees and minimumtemperaturein24hoursbefore9amlocaltimeindegree (only couple of hundred AWS provide hourly data to get the proper mean of 24 obs).
    • The Bureau of Meteorology has generated a range of gridded meteorological datasets for Australia as a contribution to the Australian Water Availability Project (AWAP). These include daily max and min temperature which you could use to generate daily averages, then calculate your long term averages from those?
    • http://www.bom.gov.au/jsp/awap/
    • Documentation is at http://www.bom.gov.au/amm/docs/2009/jones.pdf
  • A workflow to download and process the public BoM weather grids.

    1 R-depends

    # depends
    install.packages(c('raster', 'rgdal', 'plyr', 'RODBC', 'RCurl', 'XML', 'ggmap', 'maptools', 'spdep'))
    
    
    • This workflow uses the open source R software with some of our custom written packages:

2 R-code-for-extraction

################################################################
# name:r-code



# aim daily weather for any point location from online BoM weather grids

# depends on some github packages
require(awaptools)
#http://swish-climate-impact-assessment.github.io/tools/awaptools/awaptools-downloads.html
require(swishdbtools)
#http://swish-climate-impact-assessment.github.io/tools/swishdbtools/swishdbtools-downloads.html
require(gisviz)
# http://ivanhanigan.github.io/gisviz/

# and this from CRAN
if(!require(raster)) install.packages('raster'); require(raster)

# get weather data, beware that each grid is a couple of megabytes
vars <- c("maxave","minave","totals","vprph09","vprph15") #,"solarave") 
# solar only available after 1990
for(measure in vars)
{
  #measure <- vars[1]
  get_awap_data(start = '1960-01-01',end = '1960-01-02', measure)
}

# get location
locn <- geocode("daintree rainforest")
# this uses google maps API, better check this
# lon       lat
# 1 145.4185 -16.17003
## Treat data frame as spatial points
epsg <- make_EPSG()
shp <- SpatialPointsDataFrame(cbind(locn$lon,locn$lat),locn,
                              proj4string=CRS(epsg$prj4[epsg$code %in% '4283']))
# now loop over grids and extract met data
cfiles <-  dir(pattern="grid$")

for (i in seq_len(length(cfiles))) {
  #i <- 1 ## for stepping thru
  gridname <- cfiles[[i]]
  r <- raster(gridname)
  #image(r) # plot to look at
  e <- extract(r, shp, df=T)
  #str(e) ## print for debugging
  e1 <- shp
  e1@data$values <- e[,2]
  e1@data$gridname <- gridname
  # write to to target file
  write.table(e1@data,"output.csv",
    col.names = i == 1, append = i>1 , sep = ",", row.names = FALSE)
}

# further work is required to format the column with the gridname to get out the date and weather paramaters.

3 Results

  • Results
    • Markus reports:
    • "The R-script worked great once i had set a working directory that did not include spaces. (It may have been a different problem that got solved by changing the wd, but the important thing is it's running now.)"
    • Markus downloaded 70+ GB of gridded weather data from the BoM website to his local computer
    • Also note there is another set of gridded data available from the BOM, which contains pre-computed longterm mean temps, [ready to be extracted with the script](http://reg.bom.gov.au/jsp/ncc/climate_averages/temperature/index.jsp?maptype=6&period=#maps)
    • "Using this file, I only needed to get the 2012 temp grids for a comparison of 2012 vs. 30-year data. I'm going to run the extraction of 1961-1990 data, just to be sure."
    • "When we finished analysis of the long-term temperature from daily means found:
    • While the official, pre-computed long-term mean (i.e. 30-year grid file, analysed with your script) was 22.29 °C for the DRO coordinates (145.4494, -16.1041), the new value from daily means (i.e. daily minave and maxave averaged) is 24.91 °C.
    • We're not sure what causes this discrepancy, but thought we'd note that there is one.
    • For the manuscript, we ended up using the means obtained via BOM's method* to compare 1961-1990 values to 2012, both computed with the above script.
    • (* average of daily min/max temperature for each year, then averaged across the entire 30 year period)
  • Dataset discrepancy
    • Following up on the interesting a difference between the two BoM datasets.
    • One thing that might cause this might be if you are calculating the average of the annual averages ie sum(annavs)/30 or the average of all the daily averages as sum(dailyavs)/(30 * 365 or 366)? the variance will differ by these methods.
    • looks like the 30 year dataset is the former:
    • "Average annual temperatures (maximum, minimum or mean) are calculated by adding daily temperature values each year, dividing by the number of days in that year to get an average for that particular year. The average values for each year in a specified period (1961 to 1990) are added together and the final value is calculated by dividing by the number of years in the period (30 years in this case)."

    [metadata](http://reg.bom.gov.au/jsp/ncc/climate_averages/temperature/index.jsp?maptype=6&period=#maps)

    • Markus followed the BOM calculation method, and just compared it with two other approaches.
    • average of all 21914 values
    • average of yearly sum(min and max values per year)/(ndays*2)
    • average of yearly sum(daily average)/ndays)
    • where ndays = number of days per year.
    • Differences between these methods show only in the 6th decimal place, far from 2.62 degrees.

4 R-code-for-comparison

################################################################
# This is Markus' comparison script 
# also see the formatted table csv, as well as the SWISH script's raw output csv

setwd("E:\\markus\\ph.d\\aus-daintree\\data-analysis\\climate")
climate <- read.csv("minmaxave-30year-daily.csv", sep=",", dec=".")

climate$year <- substr(climate$file,1,4)
climate$dailymean <- (climate$maxave+climate$minave)/2
head(climate)


#total average across all days and values
annmean <- mean(c(climate$maxave,climate$minave))
annmean


#daily means averaged by year, then total average
annmean1 <- c(1,2)
for(i in 1:30) {
        annmean1[i] <- mean(climate[climate$year==(i+1960),]$dailymean)
        #print(annmean1[i])
}
annmean1
mean(annmean1)


#mean of all values per year, then total average
annmean2 <- c(1,2)
for(i in 1:30) {
        tmpdata <- climate[climate$year==(i+1960),]
        annmean2[i] <- (sum(tmpdata$maxave) + sum(tmpdata$minave))/(length(tmpdata$maxave)+length(tmpdata$minave))
        #print(annmean2[i])
}
annmean2; mean(annmean2)


#differences
annmean - mean(annmean1)
annmean - mean(annmean2)
mean(annmean1) - mean(annmean2)


5 Discussion

  • Principal findings: Very convenient automated extraction of location-based time series data for the precise period that is requested.
  • Weaknesses (whole method, not your script): very long download time for daily grids (~11.000 grids = huge dataset, took several days in my case). Yearly grids would be beneficial (and I believe most others are also looking mainly for data on a yearly (or larger) scale).
  • Conclusion
    • Take home message: Seems like a perfect case of "double-check the data using one and the same method".

</html>

Posted in  awap grids extreme weather events


document-first-ask-questions-later

This post is just a short note about something I’m thinking of calling “documentation-driven development”. It is based on the concept of “test-driven development”, and more recently:

Anyway, it is a small thing but hopefully big things will grow.

Posted in  research methods Data Documentation


A Great Intro 2 Logistic Regression

This is a great example of logistic regression, because it is pretty simple but covers good ground. I got it from Peter Caley;s R tutorial workbook from Charles Darwin School of Environmental Research.

It is also a tragic example of the impact weather can have on health.
The colder it is the more likely the shuttle is to explode.

The problem was with the failure rate (and number of) O-rings that failed (n.fail) related to the temperature (temp).

R Code:

#Load the data
#The following R code will construct the dataset
n.fail <- c(2, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
temp <- c(53, 66, 68, 70, 75, 78, 57, 67, 69, 70, 75, 79, 58, 67, 70, 72, 76, 81, 63, 67, 70, 73, 76)
# there were 6 o rings for each of 23 attempts
total <- rep(6,23)
# probability of fail
p.fail <- n.fail/total
# Response = resp column bind them together  
resp <- cbind(n.fail, total-n.fail)
 
###########################################################################
# we can write text files easily once the data frame or matrix is in shape
data <- as.data.frame(cbind(resp,temp))
names(data) <- c('nfail','totalMinusNfail', 'temp')
# write.csv(data, 'learnR-logistic-data.csv', row.names=F)
 
###########################################################################
# and read it in again 
# data2 <- read.csv('learnR-logistic-data.csv')
 
################################################################
# name:learnR-logistic
png('images/pfail.png')
plot(temp, p.fail, pch=16, xlim=c(40,100), ylim=c(0,0.4))
title('A plot of the proportion failed by temperature')
dev.off()

pfail.png

R Code:

###########################################################################
# newnode: linear
linear <- glm(resp ~ 1 + temp, family=binomial(link=logit))
summary(linear)
linearoutput <- summary(linear)
linearoutput$coeff
 
###########################################################################
# newnode: learnR-logistic
cf <- linearoutput$coeff
signif(cf[which(row.names(cf) == 'temp'),'Estimate'],2)
 
###########################################################################
# newnode: learnR-logistic
# write.csv(linearoutput$coeff,"challengerOfails.csv")
 
###########################################################################
# newnode: learnR-logistic
 png('images/challengerLogistic.png')
 par(mfrow=c(2,2))
 plot(linear)
 dev.off()

challengerLogistic.png

R Code:

####################################################################
# newnode: learnR-logistic
dummy <- data.frame(temp=seq(20,100,1))
pred.prob <- predict.glm(linear, newdata=dummy, type="resp")
png('images/pfailfit.png')
plot(temp, p.fail, xlab="Launch Temperature (F)",
 ylab="Proportion Failing", pch=16, xlim=c(20,100), ylim=c(0,1.0))
lines(dummy$temp, pred.prob)
dev.off()

pfailfit.png

R Code:

####################################################################
resp <- as.data.frame(resp)
resp$fail <- ifelse(resp$n.fail > 0, 1, 0)
resp$temp <- temp
 
png('images/fail.png')
with(resp, plot(temp, fail, xlab="Launch Temperature (F)",ylab="Joint damage", pch=16, xlim=c(50,80), ylim=c(0,1.0))
     )
dev.off()

fail.png

R Code:

chal.logit <- glm(fail~temp,family=binomial, data = resp)
summary(chal.logit)$coeff
 
png('images/pfailfit2.png')
cx <- c(50:80/1)
cyhat <- coefficients(chal.logit)[c(1)] +
coefficients(chal.logit)[c(2)]*cx
cpihat <- exp(cyhat)/(1+exp(cyhat))
with(resp,plot(temp,fail,xlab="Temperature",ylab="Damage",
main="Incidence of Booster Field Joint Damage vs. Temperature", xlim = c(50,80))
     )
lines(cx,cpihat)
dev.off()

pfailfit2.png

Posted in  research methods


spatially-structured-time-series-with-nmmaps

I will use the NMMAPSlite datasets for a simple example of what I describe as “Spatially Structured Timeseries” as opposed to “Spatio-Temporal” which I think more explicitly includes spatial structure in the model. See This Report for all the gory details.

R Codes

Spatiotemporal Regression Modelling

Spatiotemporal Regression Modelling

Table of Contents

1 Core Model

################################################################
# name:core
# func
setwd("~/projects/spatiotemporal-regression-models/NMMAPS-example")
require(mgcv)
require(splines)

# load
analyte <- read.csv("analyte.csv")

# clean
analyte$yy <- substr(analyte$date,1,4)
numYears<-length(names(table(analyte$yy)))
analyte$date <- as.Date(analyte$date)
analyte$time <- as.numeric(analyte$date)
analyte$agecat <- factor(analyte$agecat,
                          levels = c("under65",
                              "65to74", "75p"),
                          ordered = TRUE
                          )

# do
fit <- gam(cvd ~ s(tmax) + s(dptp) +
           city + agecat +
           s(time, k= 7*numYears, fx=T) +
           offset(log(pop)),
           data = analyte, family = poisson
           )

# plot of response functions
png("images/nmmaps-eg-core.png", width = 1000, height = 750, res = 150)
par(mfrow=c(2,3))
plot(fit, all.terms = TRUE)
dev.off()


2 Core Model Plots

/images/nmmaps-eg-core.png

</html>

Posted in  spatial dependence


morpho-and-rfigshare

In this Case Study I will use Morpho to compare directly with reml.

Step one: Set up morpho

  • Follow the instructions at the ASN SuperSite website and install Morpho 1.8 rather than latest version because it has technical issues that stop it from setting permissions.
  • Configure morpho. (I will follow the ASN SuperSite instructions as a future Case Study will be to use their KNB Metacat service).
  • Do not configure to connect to the Metacat repository, will need a password to be assigned by ASN data manager.

Step 2: Look at the REML created metadata using Morpho

  • Morpho offers to open existing sets for modification.

Code: get location of my example dataset

require(disentangle)
fpath <- system.file(file.path("extdata", "civst_gend_sector.csv"), package="disentangle")
fpath
dirname(fpath)
# [1] "/home/ivan_hanigan/Rlibs/disentangle/extdata"
  • Morpho > File > import = civst_gend_sector_eml.xml
  • (not the figshare_civst_gend_sector_eml.xml that was created when sending to figshare)
  • Error encountered. could not open metadata, open empty data package. Offered to upgrade (unable to edit > accepted)
  • unable to display data, empty data package will be shown
  • top menu > Documentation > Add/Edit ion

    Step 3: Create new datasets with Morpho

Posted in  Data Documentation