AWAP GRIDS
Table of Contents
1 Introduction and Methods
This is a work in progress. It is a stub of an article I want to put together which shows how to use several online data repositories together as a showcase of the [Scientific Workflow and Integration Software for Health (SWISH) Climate Impact Assessments](https://github.com/swish-climate-impact-assessment) project.
1.1 Authors
- Ivan Hanigan and [Markus Nolf](http://www.thinkoholic.com)
1.2 Background
- Markus Nolf offers this use case of the [SWISH EWEDB](http://swish-climate-impact-assessment.github.io/)
- Markus is pulling together his Daintree Rainforest Observatory (DRO) data into a manuscript for publication, and was looking for climate data from 2012 as well as long-term.
- More specifically, the annual precipitation and mean annual temperature for both 2012 and the 30-year mean.
- The Australian Bureau of Meteorology has a nice rainfall dataset available at http://www.bom.gov.au/climate/data/ ("Cape Trib Store" weather station), but it seems like the temperature records are patchy.
- So it is advised to use the data the DRO collects its self
- You need to apply through the [ASN SuperSite data portal](http://www.tern-supersites.net.au/knb/) for access to the daily data for the DRO.
- Note the use of the DRO met data will need to be properly cited as it is harder to keep
an AWS station running in the tropics for years than it is to collect most other data. The citation information is provided when you make a request to access the data.
- The long term mean used by most DRO researchers is from the BOM station as we only have a short record from the station itself. The offset is around 1000mm.
- However what we want is mean annual temperatures but the BOM website seems to focus more on mean minimum and maximum temperatures.
1.3 Material and Methods
- Baseline Climate Data 2012, Far North Queensland Rainforest Supersite, Cape Tribulation Node
- We can use the data portal too see [the data file in question](http://www.tern-supersites.net.au/knb/metacat/lloyd.238.13/html)
- Application for access is via email
- Extract mean annual temperatures at the BOM website
- SWISH uses BoM data a fair bit and aims to streamline access to BoM data for extreme weather event analysis (which require long term average climatology to provide the baseline that extremes are measured against).
- WRT to temperature most daily averages from BoM are calculated as average of maximumtemperaturein24hoursafter9amlocaltimeindegrees and minimumtemperaturein24hoursbefore9amlocaltimeindegree (only couple of hundred AWS provide hourly data to get the proper mean of 24 obs).
- The Bureau of Meteorology has generated a range of gridded meteorological datasets for Australia as a contribution to the Australian Water Availability Project (AWAP). These include daily max and min temperature which you could use to generate daily averages, then calculate your long term averages from those?
- http://www.bom.gov.au/jsp/awap/
- Documentation is at http://www.bom.gov.au/amm/docs/2009/jones.pdf
- A workflow to download and process the public BoM weather grids.
1 R-depends
# depends install.packages(c('raster', 'rgdal', 'plyr', 'RODBC', 'RCurl', 'XML', 'ggmap', 'maptools', 'spdep'))
- This workflow uses the open source R software with some of our custom written packages:
2 R-code-for-extraction
################################################################ # name:r-code # aim daily weather for any point location from online BoM weather grids # depends on some github packages require(awaptools) #http://swish-climate-impact-assessment.github.io/tools/awaptools/awaptools-downloads.html require(swishdbtools) #http://swish-climate-impact-assessment.github.io/tools/swishdbtools/swishdbtools-downloads.html require(gisviz) # http://ivanhanigan.github.io/gisviz/ # and this from CRAN if(!require(raster)) install.packages('raster'); require(raster) # get weather data, beware that each grid is a couple of megabytes vars <- c("maxave","minave","totals","vprph09","vprph15") #,"solarave") # solar only available after 1990 for(measure in vars) { #measure <- vars[1] get_awap_data(start = '1960-01-01',end = '1960-01-02', measure) } # get location locn <- geocode("daintree rainforest") # this uses google maps API, better check this # lon lat # 1 145.4185 -16.17003 ## Treat data frame as spatial points epsg <- make_EPSG() shp <- SpatialPointsDataFrame(cbind(locn$lon,locn$lat),locn, proj4string=CRS(epsg$prj4[epsg$code %in% '4283'])) # now loop over grids and extract met data cfiles <- dir(pattern="grid$") for (i in seq_len(length(cfiles))) { #i <- 1 ## for stepping thru gridname <- cfiles[[i]] r <- raster(gridname) #image(r) # plot to look at e <- extract(r, shp, df=T) #str(e) ## print for debugging e1 <- shp e1@data$values <- e[,2] e1@data$gridname <- gridname # write to to target file write.table(e1@data,"output.csv", col.names = i == 1, append = i>1 , sep = ",", row.names = FALSE) } # further work is required to format the column with the gridname to get out the date and weather paramaters.
3 Results
- Results
- Markus reports:
- "The R-script worked great once i had set a working directory that did not include spaces. (It may have been a different problem that got solved by changing the wd, but the important thing is it's running now.)"
- Markus downloaded 70+ GB of gridded weather data from the BoM website to his local computer
- Also note there is another set of gridded data available from the BOM, which contains pre-computed longterm mean temps, [ready to be extracted with the script](http://reg.bom.gov.au/jsp/ncc/climate_averages/temperature/index.jsp?maptype=6&period=#maps)
- "Using this file, I only needed to get the 2012 temp grids for a comparison of 2012 vs. 30-year data. I'm going to run the extraction of 1961-1990 data, just to be sure."
- "When we finished analysis of the long-term temperature from daily means found:
- While the official, pre-computed long-term mean (i.e. 30-year grid file, analysed with your script) was 22.29 °C for the DRO coordinates (145.4494, -16.1041), the new value from daily means (i.e. daily minave and maxave averaged) is 24.91 °C.
- We're not sure what causes this discrepancy, but thought we'd note that there is one.
- For the manuscript, we ended up using the means obtained via BOM's method* to compare 1961-1990 values to 2012, both computed with the above script.
- (* average of daily min/max temperature for each year, then averaged across the entire 30 year period)
- Dataset discrepancy
- Following up on the interesting a difference between the two BoM datasets.
- One thing that might cause this might be if you are calculating the average of the annual averages ie sum(annavs)/30 or the average of all the daily averages as sum(dailyavs)/(30 * 365 or 366)? the variance will differ by these methods.
- looks like the 30 year dataset is the former:
- "Average annual temperatures (maximum, minimum or mean) are calculated by adding daily temperature values each year, dividing by the number of days in that year to get an average for that particular year. The average values for each year in a specified period (1961 to 1990) are added together and the final value is calculated by dividing by the number of years in the period (30 years in this case)."
[metadata](http://reg.bom.gov.au/jsp/ncc/climate_averages/temperature/index.jsp?maptype=6&period=#maps)
- Markus followed the BOM calculation method, and just compared it with two other approaches.
- average of all 21914 values
- average of yearly sum(min and max values per year)/(ndays*2)
- average of yearly sum(daily average)/ndays)
- where ndays = number of days per year.
- Differences between these methods show only in the 6th decimal place, far from 2.62 degrees.
4 R-code-for-comparison
################################################################ # This is Markus' comparison script # also see the formatted table csv, as well as the SWISH script's raw output csv setwd("E:\\markus\\ph.d\\aus-daintree\\data-analysis\\climate") climate <- read.csv("minmaxave-30year-daily.csv", sep=",", dec=".") climate$year <- substr(climate$file,1,4) climate$dailymean <- (climate$maxave+climate$minave)/2 head(climate) #total average across all days and values annmean <- mean(c(climate$maxave,climate$minave)) annmean #daily means averaged by year, then total average annmean1 <- c(1,2) for(i in 1:30) { annmean1[i] <- mean(climate[climate$year==(i+1960),]$dailymean) #print(annmean1[i]) } annmean1 mean(annmean1) #mean of all values per year, then total average annmean2 <- c(1,2) for(i in 1:30) { tmpdata <- climate[climate$year==(i+1960),] annmean2[i] <- (sum(tmpdata$maxave) + sum(tmpdata$minave))/(length(tmpdata$maxave)+length(tmpdata$minave)) #print(annmean2[i]) } annmean2; mean(annmean2) #differences annmean - mean(annmean1) annmean - mean(annmean2) mean(annmean1) - mean(annmean2)
5 Discussion
- Principal findings: Very convenient automated extraction of location-based time series data for the precise period that is requested.
- Weaknesses (whole method, not your script): very long download time for daily grids (~11.000 grids = huge dataset, took several days in my case). Yearly grids would be beneficial (and I believe most others are also looking mainly for data on a yearly (or larger) scale).
- Conclusion
- Take home message: Seems like a perfect case of "double-check the data using one and the same method".
</html>