Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

bitbucket-has-unlimited-private-git-repositories-for-universities

  • I just got introduced to bitbucket, an alternative to Github
  • I’ve used Github for a couple of years and have been paying for extra private repositories
  • I did not realise that bitbucket worked with git too (I thought it was just for Mercurial)
  • it offers unlimited public and private repository and unlimited users (to collaborate) if you have the Academic plan.
  • when you sign up with your academic email address, you automatically get an unlimited academic plan

Posted in  research methods


auto-download-bureau-meteorology-diurnal-data

Excess Heat Indices

Excess Heat Indices

1 auto-download-bureau-meteorology-diurnal-data

  • We;re looking at health impacts of high temperatures at work
  • need to see the highest temperatures during the working hours
  • bom provides hourly data for download, but only 3 days at a time
  • we build a script and set it on a schedule to run every day, download the data and collate the results

1.1 First the FTP server URL structure

  • The URLS are predictable, just need the station id, state and a code if metro or rural

1.2 table

StationIDStateCity9orregional8_
94774N9
95719N8
94768N9
94763N9
94767N9
94910N8
94929N8
95896N8
94693N8
94691N8
95677S9
94675S9
94672S9
94866V9
95867V9
94868V9
94875V8
  • now create a script called "bomdownload.r"
  • it takes the station details and paste into the URLs
  • downloads the files
  • stores in a directory for each days downloads

1.3 R Code: bomdownload.r

filename = "~/data/ExcessHeatIndices/inst/doc/weather_stations.csv"
output_directory = "~/bom-downloads"
setwd(output_directory)

urls <- read.csv(filename)
urls_list <- paste(sep = "", "http://www.bom.gov.au/fwo/ID",
                  urls$State,
                  "60", 
                  urls$City_9_or_regional_8_,
                  "01/ID",
                  urls$State,
                  "60",
                  urls$City_9_or_regional_8_,
                  "01.",
                  urls$Station_ID,
                  ".axf")

output_directory <- file.path(output_directory,Sys.Date())
dir.create(output_directory)

for(url in urls_list)
{
  output_file <- file.path(output_directory,basename(url))
  download.file(url, output_file, mode = "wb")

}
print("SUCCESS")

  • Now the data can be combined
  • clean up the header and extraneous extra line at the bottom

1.4 R Code: bomcollation.r

# this takes data in directories from bom_download.r
 
# first get list of directories
filelist <- dir(pattern = "axf", recursive = T)
filelist
 
# next get directories for days we haven't done yet
if(file.exists("complete_dataset.csv"))
{
complete_data <- read.csv("complete_dataset.csv", stringsAsFactors = F)
#str(complete_data)
last_collated <- max(as.Date(complete_data$date_downloaded))
#max(complete_data$local_hrmin)
 
days_downloaded <- dirname(filelist)
filelist <- filelist[which(as.Date(days_downloaded) > as.Date(last_collated))]
}
 
# for these collate them into the complete file
for(f in filelist)
{
  #f <- filelist[2]
  print(f)
  fin <- read.csv(f, colClasses = c("local_date_time_full.80." = "character"), 
    stringsAsFactors = F, skip = 19)
  fin <- fin[1:(nrow(fin) - 1),]
  fin$date_downloaded <- dirname(f)
  fin$local_year <- substr(fin$local_date_time_full.80., 1, 4)
  fin$local_month <- substr(fin$local_date_time_full.80., 5, 6)
  fin$local_day <- substr(fin$local_date_time_full.80., 7, 8)
  fin$local_hrmin <- substr(fin$local_date_time_full.80., 9, 12)
  fin$local_date <- paste(fin$local_year, fin$local_month, fin$local_day, sep = "-")
  if(file.exists("complete_dataset.csv"))
  {
  write.table(fin, "complete_dataset.csv", row.names = F, sep = ",", append = T, col.names = F)
  } else {
  write.table(fin, "complete_dataset.csv", row.names = F, sep = ",")
  }
}
  • so now let;s automate the process
  • make a BAT file

1.5 BAT file (windoze)

"C:\Program Files\R\R-2.15.2\bin\Rscript.exe" "~\bom-downloads\bom_download.r"
  • add this bat file to the scheduled tasks in your control panel
  • use chron for a linux version

1.6 check the data

#### name:check the data ####
require(plyr)

setwd("~/bom-downloads")
source("bom_download.r")
dir()
source("bom_collation.r")

complete_data <- read.csv("complete_dataset.csv", stringsAsFactors = F)
str(complete_data)

# Quick and dirty de-duplication
table(complete_data$name.80.)
qc <- subset(complete_data, name.80. == "Broken Hill Airport")
qc <- ddply(qc, "local_date_time_full.80.",
  summarise, apparent_temp = mean(apparent_t))

names(qc)
png("qc-diurnal-plot.png")
with(qc,
     plot(apparent_temp, type= "l")
     )
dev.off()

qc-diurnal-plot.png

1.7 Conclusions

  • watch the data roll on in
  • each day there are about 3 days downloaded
  • meaning duplicates will be frequent, need to write a script to de-duplicate
  • cheers!

</html>

Posted in  extreme weather events


the-history-of-ons

  • according to http://en.wikipedia.org/wiki/Open_notebook_science
  • The term Open Notebook Science[7] was first used in a blog post by Jean-Claude Bradley,

  • This article http://www.infotoday.com/it/sep10/Poynder.shtml

    says Jean-Claude Bradley an organic chemist at Drexel University in Philadelphia. As with most scientists, Bradley used to be very secretive. He kept his research under wraps until publication and frequently applied for patents on his work in nanotechnology and gene therapy.

    However, he asked himself a difficult question 5 years ago: Was his research having the kind of impact he would like? He had to conclude that the answer was “no,” and this was partly a consequence of the culture of secrecy that permeates research today. So Bradley was determined to be more open. Since his collaborators were not of the same mind, he severed his ties with them and, in 2005, he launched a web-based initiative called UsefulChem. As the name implies, the aim of the initiative was also to work in the world of useful science and, today, Bradley makes new anti-malarial compounds.

  • Other links
  • http://opensourcemalaria.org/
  • http://malaria.ourexperiment.org/

  • Mat Todd’s group is at the School of Chemistry at The University of Sydney is practicing this with a (Schistosomiasis notebook and Malaria notebook)
  • http://openwetware.org/wiki/Todd

Posted in  research methods


non-linear-relationships-vs-non-linear-models-vis-a-vis-curvi-linear-terms

non-linear-model-vs-non-linear-relationship

non-linear-model-vs-non-linear-relationship


1 non-linear-relationships-vs-non-linear-models

  • I value precise language very highly
  • this is because in multi-disciplinary teams it is easy to talk using the same words and mean different things
  • in recent discussion about Distributed Lag Non-linear Models I started to reflect on something that has bothered me for a While
  • back in 2005 my old mate Prof Keith Dear picked me up on using the term "non-linear model" incorrectly and explained the maths…
  • I kind of understood but promptly forgot and found a lot of people use the term non-linear model a bit carelessly
  • Yesterday I was in a discussion about comparing non-linear relationships between different studies in a meta-analysis
  • I immediatly felt uncomfortable when we started to discuss these as "non-linear models"
  • so here is a quick bit of google fu (with a session at the coffee shop with Steve and Mishka) to remind me about the difference between

1.1 Nonlinear Regression vs. Linear Regression

A regression model is called nonlinear, if the derivatives of the model with respect to the model parameters depends on one or more parameters. This definition is essential to distinguish nonlinear from curvilinear regression. A regression model is not necessarily nonlinear if the graphed regression trend is curved. A polynomial model such as this:

1.1.1 Model 1

\(Y_{i} = \beta_{0} + \beta_{1} X_{i} + \beta_{2} X_{i}^2 + \epsilon_{i}\)
  • appears curved when y is plotted against x. It is, however, not a nonlinear model. To see this, take derivatives of y with respect to the parameters b0, b1
  • dy/db0 = 1
  • dy/db1 = x
  • dy/db2 = x2
  • None of these derivatives depends on a model parameter, the model is linear. In contrast, consider the log-logistic model

1.1.2 Model 2

\(Y_{i} = d + (a - d)/(1 + e^{b \times log(x/g)}) + \epsilon\)
  • Take derivatives with respect to d, for example:
\(dy/dd = 1 - 1/(1 + e^{b \times log(x/g)})\)
  • The derivative involves other parameters, hence the model is nonlinear.

1.2 Conclusions

  • It is probably best to refer to the polynomial as a "non-linear relationship" in a linear model
  • reserving "non-linear model" for things like Model 2

</html>

Posted in  research methods


research-protocol-we-used-for-our-bushfire-project

  • For a three year project on Bushfire smoke and Health we used the following structure in a wiki

Sections:

A.Background        
B.Proposals         
C.Approvals         
D.Budget    
E.Datasets  
F.Analysis  
G.Literature        
H.Communication     
I.Correspondance    
J.Meetings  
K.Completion        
ContactDetails      
README
TODO        

Conclusion

  • it worked quite well in the first year.
  • we didn’t use it much after that.
  • it is still on the ANU webserver. I sometimes refer back to it now, a couple of years later.

Posted in  research methods