Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

bitbucket-has-unlimited-private-git-repositories-for-universities

I just got introduced to bitbucket, an alternative to Github
I’ve used Github for a couple of years and have been paying for extra private repositories
I did not realise that bitbucket worked with git too (I thought it was just for Mercurial)
it offers unlimited public and private repository and unlimited users (to collaborate) if you have the Academic plan.
when you sign up with your academic email address, you automatically get an unlimited academic plan

Posted in research methods

09 Dec 2013

auto-download-bureau-meteorology-diurnal-data

Excess Heat Indices

1 auto-download-bureau-meteorology-diurnal-data

1 auto-download-bureau-meteorology-diurnal-data

We;re looking at health impacts of high temperatures at work
need to see the highest temperatures during the working hours
bom provides hourly data for download, but only 3 days at a time
we build a script and set it on a schedule to run every day, download the data and collate the results

1.1 First the FTP server URL structure

The URLS are predictable, just need the station id, state and a code if metro or rural

1.2 table

Station_ID	State	City₉_or_regional₈_
94774	N	9
95719	N	8
94768	N	9
94763	N	9
94767	N	9
94910	N	8
94929	N	8
95896	N	8
94693	N	8
94691	N	8
95677	S	9
94675	S	9
94672	S	9
94866	V	9
95867	V	9
94868	V	9
94875	V	8

now create a script called "bom_download.r"
it takes the station details and paste into the URLs
downloads the files
stores in a directory for each days downloads

1.3 R Code: bom_download.r

filename = "~/data/ExcessHeatIndices/inst/doc/weather_stations.csv"
output_directory = "~/bom-downloads"
setwd(output_directory)

urls <- read.csv(filename)
urls_list <- paste(sep = "", "http://www.bom.gov.au/fwo/ID",
                  urls$State,
                  "60", 
                  urls$City_9_or_regional_8_,
                  "01/ID",
                  urls$State,
                  "60",
                  urls$City_9_or_regional_8_,
                  "01.",
                  urls$Station_ID,
                  ".axf")

output_directory <- file.path(output_directory,Sys.Date())
dir.create(output_directory)

for(url in urls_list)
{
  output_file <- file.path(output_directory,basename(url))
  download.file(url, output_file, mode = "wb")

}
print("SUCCESS")

Now the data can be combined
clean up the header and extraneous extra line at the bottom

1.4 R Code: bom_collation.r

# this takes data in directories from bom_download.r
 
# first get list of directories
filelist <- dir(pattern = "axf", recursive = T)
filelist
 
# next get directories for days we haven't done yet
if(file.exists("complete_dataset.csv"))
{
complete_data <- read.csv("complete_dataset.csv", stringsAsFactors = F)
#str(complete_data)
last_collated <- max(as.Date(complete_data$date_downloaded))
#max(complete_data$local_hrmin)
 
days_downloaded <- dirname(filelist)
filelist <- filelist[which(as.Date(days_downloaded) > as.Date(last_collated))]
}
 
# for these collate them into the complete file
for(f in filelist)
{
  #f <- filelist[2]
  print(f)
  fin <- read.csv(f, colClasses = c("local_date_time_full.80." = "character"), 
    stringsAsFactors = F, skip = 19)
  fin <- fin[1:(nrow(fin) - 1),]
  fin$date_downloaded <- dirname(f)
  fin$local_year <- substr(fin$local_date_time_full.80., 1, 4)
  fin$local_month <- substr(fin$local_date_time_full.80., 5, 6)
  fin$local_day <- substr(fin$local_date_time_full.80., 7, 8)
  fin$local_hrmin <- substr(fin$local_date_time_full.80., 9, 12)
  fin$local_date <- paste(fin$local_year, fin$local_month, fin$local_day, sep = "-")
  if(file.exists("complete_dataset.csv"))
  {
  write.table(fin, "complete_dataset.csv", row.names = F, sep = ",", append = T, col.names = F)
  } else {
  write.table(fin, "complete_dataset.csv", row.names = F, sep = ",")
  }
}

so now let;s automate the process
make a BAT file

1.5 BAT file (windoze)

"C:\Program Files\R\R-2.15.2\bin\Rscript.exe" "~\bom-downloads\bom_download.r"

add this bat file to the scheduled tasks in your control panel
use chron for a linux version

1.6 check the data

#### name:check the data ####
require(plyr)

setwd("~/bom-downloads")
source("bom_download.r")
dir()
source("bom_collation.r")

complete_data <- read.csv("complete_dataset.csv", stringsAsFactors = F)
str(complete_data)

# Quick and dirty de-duplication
table(complete_data$name.80.)
qc <- subset(complete_data, name.80. == "Broken Hill Airport")
qc <- ddply(qc, "local_date_time_full.80.",
  summarise, apparent_temp = mean(apparent_t))

names(qc)
png("qc-diurnal-plot.png")
with(qc,
     plot(apparent_temp, type= "l")
     )
dev.off()

1.7 Conclusions

watch the data roll on in
each day there are about 3 days downloaded
meaning duplicates will be frequent, need to write a script to de-duplicate
cheers!

</html>

Posted in extreme weather events

06 Dec 2013

the-history-of-ons

according to http://en.wikipedia.org/wiki/Open_notebook_science
The term Open Notebook Science[7] was first used in a blog post by Jean-Claude Bradley,
This article http://www.infotoday.com/it/sep10/Poynder.shtml

says Jean-Claude Bradley an organic chemist at Drexel University in Philadelphia. As with most scientists, Bradley used to be very secretive. He kept his research under wraps until publication and frequently applied for patents on his work in nanotechnology and gene therapy.

However, he asked himself a difficult question 5 years ago: Was his research having the kind of impact he would like? He had to conclude that the answer was “no,” and this was partly a consequence of the culture of secrecy that permeates research today. So Bradley was determined to be more open. Since his collaborators were not of the same mind, he severed his ties with them and, in 2005, he launched a web-based initiative called UsefulChem. As the name implies, the aim of the initiative was also to work in the world of useful science and, today, Bradley makes new anti-malarial compounds.
Other links
http://opensourcemalaria.org/
http://malaria.ourexperiment.org/
Mat Todd’s group is at the School of Chemistry at The University of Sydney is practicing this with a (Schistosomiasis notebook and Malaria notebook)
http://openwetware.org/wiki/Todd

Posted in research methods

04 Dec 2013

non-linear-relationships-vs-non-linear-models-vis-a-vis-curvi-linear-terms

non-linear-model-vs-non-linear-relationship

1 non-linear-relationships-vs-non-linear-models
- 1.1 Nonlinear Regression vs. Linear Regression
  - 1.1.1 Model 1
  - 1.1.2 Model 2
- 1.2 Conclusions

1 non-linear-relationships-vs-non-linear-models

I value precise language very highly
this is because in multi-disciplinary teams it is easy to talk using the same words and mean different things
in recent discussion about Distributed Lag Non-linear Models I started to reflect on something that has bothered me for a While
back in 2005 my old mate Prof Keith Dear picked me up on using the term "non-linear model" incorrectly and explained the maths…
I kind of understood but promptly forgot and found a lot of people use the term non-linear model a bit carelessly
Yesterday I was in a discussion about comparing non-linear relationships between different studies in a meta-analysis
I immediatly felt uncomfortable when we started to discuss these as "non-linear models"
so here is a quick bit of google fu (with a session at the coffee shop with Steve and Mishka) to remind me about the difference between

1.1 Nonlinear Regression vs. Linear Regression

the following comes from
http://www.ats.ucla.edu/stat/sas/library/SASNLin_os.htm
verbatim except for my attempt at mathjax notation in latex

A regression model is called nonlinear, if the derivatives of the model with respect to the model parameters depends on one or more parameters. This definition is essential to distinguish nonlinear from curvilinear regression. A regression model is not necessarily nonlinear if the graphed regression trend is curved. A polynomial model such as this:

1.1.1 Model 1

\(Y_{i} = \beta_{0} + \beta_{1} X_{i} + \beta_{2} X_{i}^2 + \epsilon_{i}\)

appears curved when y is plotted against x. It is, however, not a nonlinear model. To see this, take derivatives of y with respect to the parameters b0, b1
dy/db0 = 1
dy/db1 = x
dy/db2 = x²

None of these derivatives depends on a model parameter, the model is linear. In contrast, consider the log-logistic model

1.1.2 Model 2

\(Y_{i} = d + (a - d)/(1 + e^{b \times log(x/g)}) + \epsilon\)

Take derivatives with respect to d, for example:

\(dy/dd = 1 - 1/(1 + e^{b \times log(x/g)})\)

The derivative involves other parameters, hence the model is nonlinear.

1.2 Conclusions

It is probably best to refer to the polynomial as a "non-linear relationship" in a linear model
reserving "non-linear model" for things like Model 2

</html>

Posted in research methods

03 Dec 2013

research-protocol-we-used-for-our-bushfire-project

For a three year project on Bushfire smoke and Health we used the following structure in a wiki

Sections:

A.Background        
B.Proposals         
C.Approvals         
D.Budget    
E.Datasets  
F.Analysis  
G.Literature        
H.Communication     
I.Correspondance    
J.Meetings  
K.Completion        
ContactDetails      
README
TODO        

Conclusion

it worked quite well in the first year.
we didn’t use it much after that.
it is still on the ANU webserver. I sometimes refer back to it now, a couple of years later.

Posted in research methods

02 Dec 2013

« Previous Next »

Welcome to my Open Notebook

bitbucket-has-unlimited-private-git-repositories-for-universities

auto-download-bureau-meteorology-diurnal-data

Excess Heat Indices

Table of Contents

1 auto-download-bureau-meteorology-diurnal-data

1.1 First the FTP server URL structure

1.2 table

1.3 R Code: bom_download.r

1.4 R Code: bom_collation.r

1.5 BAT file (windoze)

1.6 check the data

1.7 Conclusions

the-history-of-ons

non-linear-relationships-vs-non-linear-models-vis-a-vis-curvi-linear-terms

non-linear-model-vs-non-linear-relationship

Table of Contents

1 non-linear-relationships-vs-non-linear-models

1.1 Nonlinear Regression vs. Linear Regression

1.1.1 Model 1

1.1.2 Model 2

1.2 Conclusions

research-protocol-we-used-for-our-bushfire-project

Sections:

Conclusion

About

Recent Entries

Categories

Entries grouped by Tags

Welcome to my Open Notebook

bitbucket-has-unlimited-private-git-repositories-for-universities

auto-download-bureau-meteorology-diurnal-data

Excess Heat Indices

Table of Contents

1 auto-download-bureau-meteorology-diurnal-data

1.1 First the FTP server URL structure

1.2 table

1.3 R Code: bomdownload.r

1.4 R Code: bomcollation.r

1.5 BAT file (windoze)

1.6 check the data

1.7 Conclusions

the-history-of-ons

non-linear-relationships-vs-non-linear-models-vis-a-vis-curvi-linear-terms

non-linear-model-vs-non-linear-relationship

Table of Contents

1 non-linear-relationships-vs-non-linear-models

1.1 Nonlinear Regression vs. Linear Regression

1.1.1 Model 1

1.1.2 Model 2

1.2 Conclusions

research-protocol-we-used-for-our-bushfire-project

Sections:

Conclusion

Subscribe

About

Recent Entries

Categories

Entries grouped by Tags

1.3 R Code: bom_download.r

1.4 R Code: bom_collation.r