Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

Outlines. A way to organise, and to think

Introduction

In a previous post I talked about the conceptual framework I draw on for organising material about complex systems. I quoted a passage by Koestler 1967 that talks about the writing process describing the use of ‘outlines’, albeit without using that term. I find this quote particularly inspires me to think about how to select the material for inclusion (and identify the material ruled out of scope - those times we decide to “chop off entire flowering branches from the tree and start growing them afresh”).

A way to organise

Outlines are a method for writing, and also the basis of a toolbox called ‘outliners’. I’ve been using the excellent free outliner Keynote-NF for years and it has been great. Recently I was inspired by this paper by Schulte et al, which very convincingly introduced me to Emacs Orgmode as a viable tool for someone without a computer science background (others had described Emacs as ‘not an editor so much as a lifestyle choice’ or ‘a good operating system just lacking a decent editor’).

A way to think

I recently had the privilege to attend a class with renowned epidemiologist Professor Nancy Krieger who gave us a valuable insight to her workflow. She told us that for years she’d started every research project by writing out the outline and then filled out the branches much as Koestler’s quote described. Prof Krieger then said that she didn’t explicitly write outlines anymore, she’d been doing it for so long it was deeply ingrained and natural part of her way of thinking.

Prof Krieger went on to describe how she conducts her research workflow through the outline

  • first sketch the outline suggested by the hypotheses to be tested (data sourced, methods applied)
  • review the literature, there are very few areas of study that haven’t been looked at by someone else
  • create empty ‘table shells’ showing the form that the information will be presented in the final paper
  • exploratory data analysis and graphics
  • primary data analysis and interpretation of the results
  • another literature review - identify the dominant story people are telling about this question
  • write the discussion. Deal with any caveats head-on, how the bias may inflate or deflate estimates

Writing workflow depends on forum

The flow of steps described above remind me Andrew Gelman’s point about how he writes science papers like this, but doesn’t write blogs this way… he mentions thoughts on style, audience etc. There may be a dichotomy worth exploring between clear structure vs evolving narrative.

Key features of outliner software

To conclude I’ll just mention the most interesting features I’ve found in outliner software and will discuss these in a future post:

  • Nodes: these are the building blocks
  • Parents and children: there is a strict ordering of the nodes into a hierarchy
  • Semilattice: this allows connections to be made across branches, not just through the hierarchy
  • Folding can be used to hide entire sections of the tree, unfolding to ‘drill down’ to deep levels
  • Hoisting: to promote and demote branches, including all children branches
  • Lifting and grafting: easy reshuffling of the order
  • Checked nodes: chosen nodes can be exported while the rest remain invisible

Posted in  disentangle things


Dr Tom Ford disentangles climate change writing

Climate change and contemporary fiction

My friend Dr Tom Ford blogs about how climate and climate change are entangled in contemporary fiction. For a social scientist he does a really good job of disentangling esoteric and specialist environmental science knowledge with literary waffle and voodoo (sorry, that’s an in-joke… I consider my self a waffly exponent of enviro voodoo too - but more than happy to cast aspersions and sling defamatory insults around).

Tom’s bag as far as I can tell is to reflect on the creation and development of literary constructs used by writers to talk about climate, and climate change. Is this a whole new category of post modern and existentialist literature?

Posted in  disentangle things


The Organisation of Material

During the course of my research I have repeatedly found the organisation of material to be a challenge (I’m talking about the code, data, text and everything else related to analyses). One of the things I often struggle with is just keeping my thoughts clear and consistent between projects, through weeks and across years. I’ll try to blog about how I have decided to manage my work, and the tools I have tried and end up using.

To start with, I’d like to share a passage taken from pages 55-57 of Arthur Koestler’s “The Ghost in the Machine”, 1967, London, Pan Books, with some rephrasing of my own.

“The vexed problem of the ‘organisation of material’; vexed because the different aspects of the problem, the welter of evidence and the welter of interpretations, are all interconnected like threads in a Persian carpet. The author is keenly aware of the pattern they form; but how can he convey that pattern if he has to unpick the threads in order to explain them one at a time? Here the problem of temporal order begins to intrude, although his mind may still be functioning in the partly or wholly non-verbal regions of images and intimations.

At last he arrives at a tentative arrangement of his material, under a series of headings and sub-headings, which he shuffles about as if they were compact building blocks. They are probably each represented by a mere jotted key-word….

…now the time has come for these intentional seeds to start growing into saplings which will branch out into sections, sub-sections, and so on: the selection of evidence to be quoted, of illustrations, comment and anecdotes, each of them necessitating further strategic choices. At each node - branching point - of the growing tree, more details are filled in, until at last the syntactic level is reached, the phrase generating machine takes over, the individual words are lined up - some effortlessly, some after a painful search, and are finally transformed into patterns of contractions of finger muscles guiding a pen: the logos has become incarnate.

But of course the process is never quite as neat and orderly as that; trees do not grow in this rigidly symmetrical way. In our schematised account, the selection of the actual words occurs only at an advanced stage of the process, after the general plan and the ordering of the material have been decided on, and the buds of the tree are ready to burst open in their proper left-to-right order. In reality, however, one branch somewhere in the middle might blossom into words, while others have as yet hardly started to grow. And while it is true that the idea precedes the actual process of verbalisation, it is also true that ideas are often airy nothings until they crystallise into verbal concepts and acquire tangible shape….

Thus our tree progresses with irregular growth and constant oscillations between levels. Transforming thought into language is not a one-way process; the sap flows in both directions, up and down the branches of the tree. The operation is further complicated and sometimes brought to the verge of a breakdown by the author’s deplorable tendency to correct, erase, chop off entire flowering branches from the tree and start growing them afresh”.

Posted in  overview


My Interests

My research interests revolve around making my data analyses easier and more reproducible. I’d like it if every step in the methodical exploration of (or prediction from) data is easily documented, reproduced, transformed, integrated and fun … (if such a thing is possible).

I conceptualise data analyses as grouped networks of the many choices and revisions an analyst makes through a complex workflow in a project, clusters of which make up the many projects, databases and codebases analysts use everyday.

In my work as a data manager at a research school, I spend a lot of time linking together datasets to analyse population, health and environmental dimensions. There are complex relationships to be found but because there are so many steps required for such an analysis workflow, and there are strong barriers to reproducibility. A key barrier is due to the difficulties of tracking and documenting the numerous generations of derivative datasets and analyses. I have developed an interest in software applications used during workflow actions and decision making to document a reproducible module of work; in desperately entwined and heroically integrated analyses.

My Topics

I am interested in systems analysis, especially of environmental health systems and

  • interventions
  • catastrophes
  • explanations
  • predictions

I also focus on

  • Sucide and Drought in Southern Australia

and Atmospherics, including:

  • pollution events from Bushfires, dust storms, aeroallergen peaks
  • average exposures and extremes to temperature, humidity, rainfall… combinations of these

I also dabble in other environmental health issues such as mosquito diseases and drinking water pathogens.

My Tools

I try do all my data integrations and analysis in R/Sweave. I am currently planning to investigate the following tools

  • R/Sweave/make
  • packages
  • python
  • version control (git)
  • graphviz
  • projectTemplate
  • workflow apps: kepler\taverna\RanalyticFlow\Rgraphviz
  • disentangleThings

My Intentions

In this blog I will write about my experiences as I dealve into these new areas and write a PhD about them (and my research topics, knotted problems to be disentangled as we go). I am not a very adept programmer, but hopefully these tools will help enable me to deal with the intergrated analyses I hope to do.

Posted in  overview


About My Research

I am enticed by the work out there at the moment that challenges us to use the tools of open science (open software and open access publication) to make science more reproducible, transparent and awesome. Science December 2 2011 Volume 334 (URL here) has a special section on replication and reproducibility. Especially Roger Pengs perspective on: limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

At the moment I am keenly aware of barriers to replicability due to the voluminous generation of data and the multitude of analysis workflow decisions that can affect the results of the kind of integrated analyses I am involved with … choices include those relating to: which health outcomes are selected? which exposure estimates? of which environmental dataset?

How data is linked together; population, health and environmental data is relevant to our ability to disentangle the complex relationships to be found. I’ve had a difficult time due to multiple revisions on datasets and analysis plans that come from working with multidisciplinary teams of epidemiologists, environmental scientists and biostatisticians. I studied geography and ecology in my undergraduate degree, so am able to link multiple layers of data together in a Geographical Information System (GIS), but what I think I need is an Integrative Information Systems (IIS has a nice ring to it).

I try do all my weather/air pollution/health/demography data integrations and analysis using the Reproducible Research Reporting paradigm implemented in R/Sweave. I do this so I can maintain tight control over modelling assumptions or data decisions at any point in the workflow, including multiple versions over the course of an evolving analysis plan. This allows me to ‘drill down’ into parts of the data preparation and analysis many months after the bulk of the work has been done, change key portions that respond to the changed requirements of the project, and document the reason for the changes (in case of the inevitable change in requirements, see this xkcd comic strip here

So in summary, I am interested in tools that enable analysts to deal with the issues of intergrated analysis networks, and tracking the many choices an analyst makes through a complex web of the analysis workflow between data, analysis, reporting and archiving activities.

Posted in  overview