Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

How to explain my current research interests?

I’m having trouble explaining my current research interests.

I’m currently working on suicide and drought, heart disease and woodsmoke, violent deaths and heatwaves and a theoretical text on methods for rates, standardisation or adjustment in regression models.

Why? It’s complicated, but…

I have been working on a range of interrelated projects for the last few years that have revolved around the influence of climate on human health and wellbeing.
That might sound clear enough on first glance, but when we got stuck in to it we found we struggled to find very many health outcomes with really potent causal influences of climatic variables from the literature.

HUMAN HEALTH AND CLIMATE CHANGE IN OCEANIA: A RISK ASSESSMENT 2002

This all started with my involvement with the report 1 by Tony McMichael and Rosalie Woodruff. I was a research assistant and got my first taste of integrating data across population, health and environmental domains.

We did a great job, but after we’d completed that work, Colin and I reflected on how difficult it was to find those ‘low haning fruit’ that might be most easily analysted in this new direction of environmental epidemioligy. We then met Neville Nicholls from the BoM and found out that there was a strong suspicion of the increased risk of suicide during droughts amongst the meteorologists (to the point they were anxious about reporting unfavourable forecasting seasonal rainfall estimates based on the SOI and El Nino weather patterns). We ended up publishing a simple paper about that topic 2

Other health outcomes suggested themselves over the course of the following few years:

  • will be bullet points
  • includes Ross River Virus in WA
  • heart disease and woodsmoke
  • violent deaths and heatwaves
  • a theoretical text on methods for rates, standardisation or adjustment in regression models

So I am struggling to get a succint statement that reflects the current focus of my research interests. Luckily my core reasearch interest is simpler: better understanding the dynamics of the many systems involved in human ecology.

Posted in  research methods


Historical GIS evidence of Dengue in the far southeast of Australia?

Dengue Fever (DF) and climate

DF is a mosquito borne virus that has high public health impact and is potentially strongly influenced by climate.

There is an interesting story attached to this map from a paper by Russell et al 2009 1 that draws together documentary references to historic Dengue Fever (DF) transmission, some as far south as Gosford and Bourke in NSW.

Russel map of Dengue Virus

This asserted southerly extent is much further south than that delineated in the Potential Climatic Niche model of Hales et al 2002 2 and is used as a basis to refute the veracity (and utility) of such a model.

The Hales’ model determined the potential transmission zone in Australia as being constrained much further north by levels of humidity (vapour pressure). That predictive model was based on a regression of many climatic attributes of all locations of disease transmission in a global database. There is an ongoing debate between the epidemiology and the entomology camps over this modelling.

A key issue regarding this contentious map is a reference used by Russell et al 1 to dengue transmission having occurred inland at Bourke (30 S) and on the coast at Gosford (33 S, 80 km north of Sydney) in the first half of the 1900s, (ref 13 Lee et al, 1987 & ref 14 Lumley and Taylor 1943 ). However, a text search of Lee et al revealed only one reference to Gosford and that refers to the presence of vector mosquito Aedes aegypti (AE), but not dengue transmission.

The second reference is by the entomologist Frank H Taylor who mentions Brooklyn NSW, on page 158 (paragraph 2) .. as “the most southerly discovered location of AE” (the vector of DF in this area). Now, Brooklyn is on the railway on the Hawkesbury just south of Gosford. Taylor talks about railways spreading the vector. Maybe AE was found in NSW in conjunction with steam trains and railway water tanks for steam trains (the railway to Bourke was opened in 1885). There is no clear discussion in Lumley and Taylor of DF transmission around Brooklyn.

It is possible that the references cited for this map are really just evidence of the vector distribution, not actual virus transmission. It is also likely that the southern-most border of the AE vector would not be the southern-most fringe of DF transmission. This therefore casts doubt on Russell’s map which suggests the southern limit of DF transmission to have been as far south as Gosford and Bourke in NSW (Bourke is also asserted by Russell et al as a known transmission site using these same references).

What Russell may be doing is just connecting the two points - Brooklyn and Bourke (who knows if there is a single data point in between) and asserting that that line of southernmost AE proof is the southerly boundary of DF transmission. Perhaps if AE got to Brooklyn it might also have got to Bourke .. and Bourke being hotter than Brooklyn, DF transmission may have occurred there .. but we need more evidence than Russell et al provide.

(Thanks to my epidemiological friends for scouring the historical references with a thoroughly incredulous eye).

Posted in  spatial


Occam's Razor, Einstein's Razor and Chamberlin's Complex Thought

Introduction

I just read TC Chamberlin’s paper on the Method of Multiple Working Hypotheses and was very taken by the concept of Complex Thought:

“The use of the method leads to certain peculiar habits of mind which deserve passing notice, … it develops a habit of thought analogous to the method itself, … a habit of parallel or complex thought. Instead of a simple succession of thoughts in linear order, the procedure is complex, and the mind appears to become possessed of the power of simultaneous vision from different standpoints.”

I was struck by the difference in this method to the KISS or Keep It Sensibly Simple approach I’ve been taught (also sometimes misrepresented as Keep It Simple Stupid… that only applies to Stupid theories, in my view).

Occam’s Razor is a principle to Keep It Very Simple:

“to select among competing hypotheses that which makes the fewest assumptions and thereby offers the simplest explanation of the effect.”

Einstein’s Razor is a a warning against too much simplicity, with it’s exhortation that we can make it as simple as possible:

“without having to surrender the adequate representation of a single datum of experience”.

Complex Thought

I love the idea that I can train myself to acheive a kind of Science Zen that unveils all kinds of complex multifactorial causal mechanisms… but I fear the Danger of Vacillation Chamberlin speaks about:

“Like a pair of delicately poised scales, every added particle on the one side or the other produces its effect in oscillation. But such a pair of scales may be altogether too sensitive to be of practical value in the rough affairs of life”.

Posted in  disentangle things


The Shane-Weiss-Reich-White.worg approach to Code Management

Introduction

I’ve been thinking alot about workflows recently. I’m talking about the data, code, decisions etc bound up in the flow of material going through any project in the collective program of work we have going on at the Centre I work at. The group are facing tough questions about how we do things; and why. So in my reflections I’ve reviewed some links I’d saved and present below a unified summary version called the…

Shane-Weiss-Reich-White.worg approach

This a synthesis I’ve put together of approaches to managing code in complex data analysis projects. It’s named after key exponents on various blogs, wikis and web Q-and-A sites.

Stackoverflow user Shane posted this excellent comment to stackoverflow to:

“start off with one R file as you start a project (or a set of files like in the Bernd Weiss and Josh Reich examples), and progressively add to it (so that it grows in size) as you make discoveries.”

Bernd Weiss’ projects have:

  • analysis,
  • data and
  • document directories and
  • README.org (an Emacs org-mode file).

Bernd and Jeromy Anglim had an interesting discussion about this workflow in this post at stackexchange. Especially note that Bernd recommends that every publication, presentation or semester/class etc. has its own git repository. BUT that “there is one real downside: using the same dataset in different publications means to maintain different versions of ‘initialization code’ (define missing values, generate new variables etc.). To overcome this problem, Bernd decided to maintain ONE study/dataset-related repository which contains the original init-file. For each publication, presentation etc. use a copy of the original data-file as well as of the init-file (in R via file.copy()). Of course, whenever you create a new variable you’ll need to modify the original init-file and do a file.copy() (which is the most annoying part of the approach).”

Josh Reich breaks projects into 4 pieces:

  • load.R,
  • clean.R,
  • func.R and
  • do.R

John Myles White’s leads the ProjectTemplate package that has ‘create.project(minimal = TRUE)’ which creates the layout:

  • cache,
  • config,
  • data,
  • munge,
  • src, and
  • README

I’ve just added reports. If a project is a little bit bigger than minimal I’ll add admin, metadata, versions etc etc. I contributed that idea to the ProjectTemplate discussion list… but those guys seem to mostly use the default minimal = FALSE which creates all the possible directories including reports. I’ll try to keep it simple and just bolt on whatever bits suit my needs as I go.

Which Code Editor is the Best?

And finally the meta work holding the project together is the code editor. Despite the old joke which describes Emacs as “a great operating system, lacking only a decent editor”, this editor has killer functions for managing code. Check out worg the Emacs Org-Mode Community. Recently proponents of worg wrote this article. Previously I’ve REALLY enjoyed NPPtoR (only available under windoof).

In the words of JD Long in response to Shane “The choice of the specific tool is more idiosyncratic and not near as important as using SOMETHING.”

Posted in  disentangle things


Outlines. A way to organise, and to think

Introduction

In a previous post I talked about the conceptual framework I draw on for organising material about complex systems. I quoted a passage by Koestler 1967 that talks about the writing process describing the use of ‘outlines’, albeit without using that term. I find this quote particularly inspires me to think about how to select the material for inclusion (and identify the material ruled out of scope - those times we decide to “chop off entire flowering branches from the tree and start growing them afresh”).

A way to organise

Outlines are a method for writing, and also the basis of a toolbox called ‘outliners’. I’ve been using the excellent free outliner Keynote-NF for years and it has been great. Recently I was inspired by this paper by Schulte et al, which very convincingly introduced me to Emacs Orgmode as a viable tool for someone without a computer science background (others had described Emacs as ‘not an editor so much as a lifestyle choice’ or ‘a good operating system just lacking a decent editor’).

A way to think

I recently had the privilege to attend a class with renowned epidemiologist Professor Nancy Krieger who gave us a valuable insight to her workflow. She told us that for years she’d started every research project by writing out the outline and then filled out the branches much as Koestler’s quote described. Prof Krieger then said that she didn’t explicitly write outlines anymore, she’d been doing it for so long it was deeply ingrained and natural part of her way of thinking.

Prof Krieger went on to describe how she conducts her research workflow through the outline

  • first sketch the outline suggested by the hypotheses to be tested (data sourced, methods applied)
  • review the literature, there are very few areas of study that haven’t been looked at by someone else
  • create empty ‘table shells’ showing the form that the information will be presented in the final paper
  • exploratory data analysis and graphics
  • primary data analysis and interpretation of the results
  • another literature review - identify the dominant story people are telling about this question
  • write the discussion. Deal with any caveats head-on, how the bias may inflate or deflate estimates

Writing workflow depends on forum

The flow of steps described above remind me Andrew Gelman’s point about how he writes science papers like this, but doesn’t write blogs this way… he mentions thoughts on style, audience etc. There may be a dichotomy worth exploring between clear structure vs evolving narrative.

Key features of outliner software

To conclude I’ll just mention the most interesting features I’ve found in outliner software and will discuss these in a future post:

  • Nodes: these are the building blocks
  • Parents and children: there is a strict ordering of the nodes into a hierarchy
  • Semilattice: this allows connections to be made across branches, not just through the hierarchy
  • Folding can be used to hide entire sections of the tree, unfolding to ‘drill down’ to deep levels
  • Hoisting: to promote and demote branches, including all children branches
  • Lifting and grafting: easy reshuffling of the order
  • Checked nodes: chosen nodes can be exported while the rest remain invisible

Posted in  disentangle things