Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

How To Effectively Implement Electronic Lab Notebooks In Epidemiology

  • It is often stated in the literature that an electronic lab notebook is a core component of reproducible research
  • For example the following is from Buck, S. (2015). Solving reproducibility. Science, 348(6242), 1403–1403. http://dx.doi.org/10.1126/science.aac8041
one of the most effective ways to promote high-quality science 
is to create free open-source tools that give scientists
easier and cheaper ways to incorporate transparency
into their daily workflow: from open lab notebooks, to
software that tracks every version of a data set, to dynamic 
document generation.

  • But I have struggled to operationalise the lab notebook in the epidemiology projects I work in
  • Here are some notes based on my recent readings and attempts with a new team

Modularised lab notebooks

There seem to be a small number of components to a lab notebook that can be defined as:

  • Data management plan
  • Workplan
  • Worklog
  • Workflow
  • Distribution

One thing I think is important is to have levels of organisation in a hierarchy:

  • Macro level: The ‘Research Programme’ level is about the entire breadth of the projects in the group.
    • Data Management Plan: including managing the computers and a Data Inventory
    • Personal Workplan and Worklog: this is an overview of things I do, plan to do or learn along the way (this is for the high level things like planning professional development, or a holiday)
    • This is operationalised in the blog you are reading right now.
  • Meso level: the ‘Research Project’ level is about a single study, or a small group of studies based around a core dataset or Concept
    • This is the level that you might write up a manuscript for a journal, or report to a client
    • Project workplan: at this level there may be high level information about the study design, hypotheses, resources and admin for managing relationships with a variety of collaborators
    • Worklog: WS Noble http://dx.doi.org/10.1371/journal.pcbi.1000424 recommends that this be the main lab notebook for the analysts
    • He says ‘This is a document that resides in the root of the results directory and that records your progress in detail. Entries in the notebook should be dated, and they should be relatively verbose, with links or embedded images or tables displaying the results of the experiments that you performed. In addition to de- scribing precisely what you did, the notebook should record your observations, conclusions, and ideas for future work’
  • Micro level: the ‘Experiment Results’ level is about work you might do on a single day, or over a week
    • Workflow scripts: At this level each ‘experiment’ is written up in chronological order, as entries to the Worklog at the meso level
    • Noble recommends ‘create either a README file, in which I store every command line that I used while performing the experi- ment, or a driver script (I usually call this runall) that carries out the entire experiment automatically’…
    • and ‘you should end up with a file that is parallel to the lab notebook entry. The lab notebook contains a prose description of the exper- iment, whereas the driver script contains all the gory details.’
    • this is the level I usually think of managing the distribution side of things. I will want to pack up the results and email to my collaborators, or decide on the one set of tables and figures to write into the manuscript for submission to a journal. If this is accepted for publication, this is the one combined package of ‘analytical data and code’ that I would consider putting up online (to github) as supporting information for the paper.

Posted in  disentangle Workflow tools


blog comments powered by Disqus