Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

Project Templates That Initialize A New Project With A Skeleton Automatically

I have been using John Myles Whites ProjectTemplate R package for ages
I really like the ease with which I can get up and running a new project
and the ease with which I can pick up an old project and start adding new work

Quote from John’s first post

My inspiration for this approach comes from the rails command from
Ruby on Rails, which initializes a new Rails project with the proper
skeletal structure automatically. Also taken from Rails is
ProjectTemplate’s approach of preferring convention over
configuration: the automatic data and library loading as well as the
automatic testing work out of the box because assumptions are made
about the directory structure and naming conventions that will be used

http://www.johnmyleswhite.com/notebook/2010/08/26/projecttemplate/

I dont know anything about RoR but this philosophy works really well for my R programming too

R Code

if(!require(ProjectTemplate)) install.packages(ProjectTemplate); require(ProjectTemplate)
setwd("~/projects")
create.project("my-project")
setwd('my-project')
dir()
##  [1] "cache"       "config"      "data"        "diagnostics" "doc"        
##  [6] "graphs"      "lib"         "logs"        "munge"       "profiling"  
## [11] "README"      "reports"     "src"         "tests"       "TODO"   
##### these are very sensible default directories to create a modular
##### analysis workflow.  See the project homepage for descriptions
 
# now all you need to do whenever you start a new day 
load.project()
# and your workspace will be recreated and any new data automagically analysed in
# the manner you want

Project Administration

I;ve found that these directories do not work so well for the administration of my projects and so I put together a different set of automatic defaults
Ive based it on the University of Manitoba Centre for Health Policy- along with some other sources I can recall
The full set

# A.Background
# B.Proposals # C.Approvals # D.Budget
# E.Datasets
# F.Analysis
# G.Literature
# H.Communication
# I.Correspondance
# J.Meetings
# K.Completion
# ContactDetails.txt
# README.md # TODO.txt

R Code: my subset

AdminTemplate <- function(rootdir = getwd()){
  setwd(rootdir)
  dir.create(file.path(rootdir,'01_planning'))
  dir.create(file.path(rootdir,'01_planning','proposal'))
  dir.create(file.path(rootdir,'01_planning','scheduling'))
  dir.create(file.path(rootdir,'02_budget'))
  dir.create(file.path(rootdir,'03_communication'))
  dir.create(file.path(rootdir,'04_reporting_and_meetings'))
  file.create(file.path(rootdir,'contact_details.txt'))
  file.create(file.path(rootdir,'README.md'))
  }

Conclusion

hopefully by formalising some of these into my workflow I will find my projects easier to navigate through
and pick up or put down as needed

Posted in research methods

29 Mar 2014

long-term-climatology-contextual-data-for-ecological-research

Studies of extreme weather events such as drought require long term climate data
these are available at continental scale derived from
observations from a network of weather stations that are interpolated to a surface
I have been working on techniques with R and online resources (the Australian Water Availabilty Project AWAP) to make working with these long term climatology datasets easier.
The package is in development at https://github.com/swish-climate-impact-assessment/awaptools

Case Study

aim is need to look at seasonal rainfall means.
first thing is to download the data (I’m also working on a Rstudio server to host these data, as a Virtual Lab).
data = multiple years of monthly rainfall data in a raster grid format.
aim = combine rainfall in a seasonal basis in one grid
(i.e. M-J-J-A-S-O 1900, 1901 etc.) calculate mean of each cell.
assumption1 = filenames have year, month embedded so they will be sorted in order when listed
assumption2 = all months are available, from 1:12 for all years in
study period
notes:
this requires the files are listed in the right order by name, and all months are present. might be better to use grep on the file name and strsplit/substr to extract the month identifier more precisely?

Results

alttext

I’m looking for collaboration on this!

I’ve been in contact with:
https://github.com/RationShop/rain_r/wiki/The-Rain-Project
I apologise to windoze users but feel a bit like this guy when recomending how to install:

quote:

probably the easiest way to do this is to use
Hadley's devtools package.  Assuming you have devtools and my package's
dependencies.  If you're using Linux or the BSD's, this should just
work.  Welcome to the good life, player.  I think this will work out
of the box on a Mac.  I have no idea if this will work on Windows; how
you strange people get anything done amazes me.  At the least,
Im guessing you have to install Rtools first.  
You could also just source all the R scripts like 
some kind of barbarian

R Code:

# depends
require(swishdbtools)
if(!require(raster)) install.packages("raster", dependencies = T); require(raster)
if(!require(rgdal)) install.packages("rgdal", dependencies = T); require(rgdal)

# on linux can install direct, on windoze you configure Rtools
require(devtools)
install_github("awaptools", "swish-climate-impact-assessment")
require(awaptools)

homedir <- "~/data/AWAP_GRIDS/data"
outdir <- "~/data/AWAP_GRIDS/data-seasonal-vignette"
 
# first make sure there are no left over files from previous runs
#oldfiles <- list.files(pattern = '.tif', full.names=T) 
#for(oldfile in oldfiles)
#{
#  print(oldfile)
#  file.remove(oldfile)
#}
################################################
setwd(homedir)
 
# local customisations
workdir  <- homedir
setwd(workdir)
# don't change this
years <- c(1900:2014)
lengthYears <- length(years)
# change this
startdate <- "2013-01-01"
enddate <- "2014-01-31"
# do
load_monthly(start_date = startdate, end_date = enddate)
 
# do
filelist <- dir(pattern = "grid.Z$")
for(fname in filelist)
{
  #fname <- filelist[1]
  unzip_monthly(fname, aggregation_factor = 1)
  fin <- gsub(".grid.Z", ".grid", fname)
  fout <- gsub(".grid.Z", ".tif", fname)
  r <- raster(fin)
  writeRaster(r, fout, format="GTiff",  overwrite = TRUE)
  file.remove(fin)
}
 
cfiles <- list.files(pattern = '.tif', full.names=T) 
# loop thru
# NEED TO SET THE FILESOFSEASEON_I counter EACH TIME YOU start
 
 
for(season in c("hot", "cool"))
{
  # season <- "hot" # for labelling
  if(season == "cool")
  {
    filesOfSeason_i <- c(5,6,7,8,9,10)  
    endat <- lengthYears
  } else {
    filesOfSeason_i <- c(11,12,13,14,15,16) 
    endat <- lengthYears - 1
  }
  
  for (year in 1:endat){ 
    ## setup for checking month 
    # year  <- 1 #endat
    
    
    ## checking
    print(cat("####################\n\n"))
    print(cfiles[filesOfSeason_i])
    
    b <- brick(stack(cfiles[filesOfSeason_i])) 
    ## calculate mean 
    m <- mean(b) 
    ## checking 
    # image(m) 
    writeRaster(m, file.path(outdir,sprintf("season_%s_%s.tif", season, year)), drivername="GTiff")
    filesOfSeason_i <- filesOfSeason_i + 12
  } 
}
 
##### now we will overall average
setwd(outdir)
for(season in c("cool", "hot"))
{
  cfiles <- list.files(pattern = season, full.names=T)   
  print(cfiles)
  b <- brick(stack(cfiles)) 
  ## calculate mean 
  m <- mean(b) 
  ## checking 
  # image(m) 
  writeRaster(m, file.path(outdir,sprintf("season_%s.tif", season)), drivername="GTiff")
}
 
# qc
cool <- raster("season_cool.tif")
hot <- raster("season_hot.tif")
par(mfrow = c(2,1))
image(cool)
image(hot)
 
# just summer rainfall
png("season_hot.png")
image(hot)
dev.off()

Posted in extreme weather events Drought

28 Feb 2014

yearmon-class-and-interoperability-with-excel-and-access

I am working in a new job where we are recieving data from a lot of different groups
we aim to review these datasets and then publish them for a wide audience of potential users
therefore usability and interoperability is a key concern
we recieved some data with Month and Year as Apr.12
I know this is easy to convert to a date/time class with in R but wondered what a better format would be to recommend for our datasets to use to maximise utility downstream (especially for non R users)
Apr.12 is assumed to be text in excel so need something else
Apr-12 is assumed to be the twelfth of April this year (ie 12/4/2014)

In R the solution might be to use the zoo package

require(zoo)
as.yearmon("Apr.12", "%b.%y")
# [1] "Apr 2012"

# other options abound
as.yearmon("apr12", "%b%y")

# the default is YYYY-MM or similar
as.yearmon("2012-04")
as.yearmon("2012-4")

So I went looking at how Excel and Access deal with this
found that the best appeard to be MMM-YYYY in terms of how these software assume the data should look

R Code:

as.yearmon("Apr-2012", "%b-%Y")

# but will need to specify format because otherwise fails
as.yearmon("Apr-2012")
# NA

Conclusion

I recommend the MMM-YYYY option
it is pretty good that in Excel it is assumed 1/04/2012
and if MS access is set to date/time and format = mmm-yyyy is ok for data entry (but not importing)
to import this use a shorttext type, then post-import, change to date/time with mmm-yyyy (the . failed)

Posted in research methods Data Documentation

27 Feb 2014

gantting-like-a-hacker

Background

“Blogging like a Hacker” has become a paradigm for programmers who want to link their code to their blogs.
I’ve followed this paradigm for a while to support my scientific projects, enhancing their transparency and reproducibility.
I’ve started a new project where I need to also manage project management and planning (following Tomas Aragon’s tutorial)
I propose that the same methods I use in scientific programming and blogging like a hacker can be used in “Gantting like a Hacker”

The title for this post is also influenced by the poste over at [Geek

Manager](http://blog.geekmanager.co.uk/2007/05/02/using-the-best-plan-format/).

That post says taht “Premature Gannting” is the act of making a “huge Gantt chart (often in MS Project).”
Gannting like a Hacker is doing this in a scripted environment, without relying on closed-source proprietry software such as the Windoze options.
The community of bloggers (mostly geeks) who are following a style of blogging that originated with the invention of Jekyll, unveiled in this post by Tom Preston-Werner; GitHub’s co-founder (aka mojombo).
This experiment uses Taskjuggler and Emacs Orgmode

Materials and Methods

Use Ubuntu 12.04 Long Term Support (LTS)
with Ruby

Code:install task juggler

gem install taskjuggler

Gantt charts with Emacs Orgmode

I’m using an Emacs tool to use TaskJuggler to handle the task scheduling and creation of Gantt chart suitable for a Pointy-haired Boss.
I hated using the Orgmode script to compile the parts of the Gantt chart so I wrote this R script to convert a spreadsheet into an Orgmode script
the spreadsheet is organised in a fairly simple way shown below.

alttext

Results

and executing my script will convert this into a Emacs orgmode file that will export to a taskjuggler file (use C-c C-e j) and viola!

alttext

Conclusions

This simplifies the Orgmode taskjuggler creation
A drawback is that it has to go through the Emacs export function.

Posted in research methods

22 Feb 2014

Aggregation Of Statistical Local Areas

Reproducibility

A subset of the data and code used for this blog post is available at https://github.com/ivanhanigan/aggregation-of-slas-or-sa2s
Some parts of the data I used are not available due to shared Intellectual Property
The spatial data and SEIFA Socioeconomic index data are publicly available from the Australian Bureau of Statistics
The New Groups categories were generously contributed by John Glover of the Public Health Information Development Unit, The University of Adelaide.

Introduction

The Aim is to aggregate Statistical Local Areas (SLAs, recently relabelled SA2) Australian Standard Geographical Classification (ASGC, recently relabelled ASGS) to achieve a greater level of privacy protection. The rules to achieve a geography amenable to statistical comparisons are:

similar populations (around 20,000 to 30,000)
homogenous Index of Relative Socioeconomic Disadvantage
nested within the next level up in the ASGC/ASGS
so that SSD (recently relabelled SA3) are not split.
the SA2s can be aggregated through a process of assigning alike areas to groups, and reviewing/adjusting these assignments
This document contains some suggestions for adjusting groupings to better reflect differences in the level of disadvantage, while still adhering to the rules outlined.

Results

Data Prep

The data were prepared as spatial files

alttext

Assess proposed split in Belconnen

the proposed split in belconnen looks good
This involves splitting Belconnen West (old SLA group) into two regions. The first generally has much higher proportion of individuals in a bottom quintile SES score
Group 1 = Belconnen/ Charnwood/ Florey/ Higgins/ Holt/ Latham
Group 2 = Flynn (ACT)/ Fraser/ Melba/ Spence

alttext

Assess the outliers

this can be done by identifying their component CCD SEIFA within new groups
the seifa of CCD in outliers are compared to the others within their new groups

alttext

zoom in on croweded areas

alttext

	sa2_name.x	new_sa2_group	notes
12	Bruce	Bruce/ Evatt/ Giralang/ Kaleen/ Lawson/ McKellar	higher than neighbours
14	Campbell	Acton/ Braddon/ Campbell/ Civic/ Reid/ Turner	ok
20	Civic	Acton/ Braddon/ Campbell/ Civic/ Reid/ Turner	ok
31	Fadden	Fadden/ Gowrie (ACT)/ Macarthur/ Monash	ok
32	Farrer	Farrer/ Isaacs/ Mawson/ Pearce/ Torrens	higher than neighbours bar one ccd
37	Forrest	Forrest/ Griffith (ACT)/ Kingston - Barton/ Narrabundah/ Red Hill (ACT)	higher than neighbours
60	Isaacs	Farrer/ Isaacs/ Mawson/ Pearce/ Torrens	higher than neighbours bar one ccd
64	Kingston - Barton	Forrest/ Griffith (ACT)/ Kingston - Barton/ Narrabundah/ Red Hill (ACT)	higher than neighbours
71	Macarthur	Fadden/ Gowrie (ACT)/ Macarthur/ Monash	ok
73	Macquarie	Aranda/ Cook/ Hawker/ Macquarie/ Page/ Scullin/ Weetangera	ok
87	O'Malley	Chifley/ Lyons (ACT)/ O'Malley/ Phillip	higher than neighbours
89	Page	Aranda/ Cook/ Hawker/ Macquarie/ Page/ Scullin/ Weetangera	ok
98	Scullin	Aranda/ Cook/ Hawker/ Macquarie/ Page/ Scullin/ Weetangera	ok

Investigating areas: eg the Kingston - Barton area

we can zoom in on some of these areas

alttext

Conclusions

Basically, the problem comes from public housing policies in Canberra which distorts the effect of the housing market and land values in segregating rich from poor. Essentially, there are highly advantaged suburbs with pockets of disadvantaged public housing.

Other ‘problematic’ features re this are:

proximity to ornamental lakes
proximity to urban green space
proximity to rural residential hubs (walaroo road? hall?). This is a bit of a reverse statement – but different to ‘distance from urban centre’
elevation, especially with a view.

Potentially the issue is going to be that this can’t be solved if you want to maintain SA2 as the base level - the distinctions are going to be at an SA1 or even mesh block level.

Posted in spatial

18 Feb 2014

« Previous Next »

Welcome to my Open Notebook

Project Templates That Initialize A New Project With A Skeleton Automatically

Quote from John’s first post

R Code

Project Administration

The full set

R Code: my subset

Conclusion

long-term-climatology-contextual-data-for-ecological-research

Case Study

Results

I’m looking for collaboration on this!

quote:

R Code:

yearmon-class-and-interoperability-with-excel-and-access

In R the solution might be to use the zoo package

R Code:

Conclusion

gantting-like-a-hacker

Background

Materials and Methods

Code:install task juggler

Gantt charts with Emacs Orgmode

Results

Conclusions

Aggregation Of Statistical Local Areas

Reproducibility

Introduction

Results

Data Prep

Assess proposed split in Belconnen

Assess the outliers

Investigating areas: eg the Kingston - Barton area

Conclusions

About

Recent Entries

Categories

Entries grouped by Tags

Welcome to my Open Notebook

Project Templates That Initialize A New Project With A Skeleton Automatically

Quote from John’s first post

R Code

Project Administration

The full set

R Code: my subset

Conclusion

long-term-climatology-contextual-data-for-ecological-research

Case Study

Results

I’m looking for collaboration on this!

quote:

R Code:

yearmon-class-and-interoperability-with-excel-and-access

Toward a standard and unambiguous format for sharing Year-Month data

In R the solution might be to use the zoo package

R Code:

Conclusion

gantting-like-a-hacker

Background

Materials and Methods

Code:install task juggler

Gantt charts with Emacs Orgmode

Results

Conclusions

Aggregation Of Statistical Local Areas

Reproducibility

Introduction

Results

Data Prep

Assess proposed split in Belconnen

Assess the outliers

Investigating areas: eg the Kingston - Barton area

Conclusions

Subscribe

About

Recent Entries

Categories

Entries grouped by Tags