Disentangle Things by Ivan Hanigan

Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

Setting Up A Workflow Script With Code Chunks

This post describes some ideas and techniques I use to set up a “workflow script”. I use this term to refer to the structured combination of code, data and narrative that make an executable Reproducible Research Report (RRR).

A lot of these ideas are inpsired by a great paper by Kieran Healy called “Choosing Your Workflow Applications” available at https://github.com/kjhealy/workflow-paper to accompany his Emacs Starter Kit. My shortened version of his main points are:

1 use a good code editor
2 analyse data with scripts
3 store your work simply and document it properly
4 use a version control system
5 Automate back ups
6 Avoid distracting gadgets

Here’s my current approach in each of these categories

1 use Emacs with Orgmode (and kjhealy’s drop-in set of useful defaults)
2 Scripts that utilise the literate programming technique of mixing Code Chunks in with descriptive prose
3 John Myles White’s ProjectTemplate R Package and Josh Riech’s LCFD paradigm
4 git and GitHub for version control

5 Automated Backups and 6 Avoiding Gadgets are still somethings I find challenging

1 Use a good code editor

I like using Emacs with Orgmode.

I have previously tried a variety of code editors from Tinn-r, NppToR, Rstudio and Eclipse.
Emacs with Orgmode suits me the most because it has a great number of features especially the linkage with LaTeX or HTML export
A key reference to look at for reasons why Emacs is so good for scientific work is Eric Schulte et al “A Multi-Language Computing Environment for Literate Programming”
Here is a link to a great orgmode description
(this guy spends a lot of time on tweaking his set up)

2 Analyse data with Scripts (stitch together code chunks)

I use Scripts but prefer to think of them as stitched together Code Chunks with prose into Compendia.

Compendia are documents that weave together Code and Prose into an executable report
The underlying philosophy is called Reproducible Research Reports
A very useful tool is a keyboard shortcut to quickly create a chunk for code
so you can be writing parts of the report like this: “Blah Blah Blah as shown in Figure X and Table Y”
then just hit the correct keys and WHAMM-O there is a new chunk ready for the code that creates Figure X and Table Y to be written.
Here is how I use Emacs to achieve this (the other editors I mentioned above also have the abiltiy to do this too). The IPython Notebook does this stuff too but calls chunks “cells” for some reason.

Emacs Code: Put this into the ~/.emacs.d/init.el file

(define-skeleton chunk-skeleton
  "Info for a code chunk."
  "Title: "
  "*** " str "-code\n"
  "#+name:" str "\n"
  "#+begin_src R :session *R* :tangle src/" str ".r :exports reports :eval no\n"
  "#### name:" str " ####\n"
  "\n"
  "#+end_src\n"
)
(global-set-key [?\C-x ?\C-\\] 'chunk-skeleton)

Using the Emacs Shortcut

now whenever you type Control-x control-\ a new code chunk will appear
you’ll be typing “blah blah blah” and think I need a figure or table, just hit it.
move into the empty section and add some code
you can hit C-c ‘ to enter a org-babel code execution session that will be able to send these line by line to an R session
or within the main org buffer if your eval flag is set to yes then you can run the entire chunk (and return tabular output to the doc) using C-c C-c
To export the code chunks and create the modular code scripts without the narrative prose use C-c C-v t
this is called “tangling” and the chunks will be written out to the file specified in the chunk header “:tangle” flag

Compiling the resulting Compendium

Emacs uses LaTeX or HTML to produce the Report
I find both of these outputs very pleasing
to compile to TEX use C-c C-e d
for HTML use C-c C-e h (FOR CODE HIGHLIGHTING INSTALL htmlize.el)
these commands will also evaluate all the chunks where “:eval” = yes to load the data and calculate the results fresh.
AWESOME!

3 Store your work simply and document it properly

I use the ProjectTemplate R package to organise my code sections into modules
These modules are organised into the Reichian LCFD paradigm described first on StackOverflow here, and encoded into the makeProject R package
documentation is within the main orgmode script
data documentation is a whole other universe that I will deal with in a separate post

4 use a version control system using git and github

# once you have the project via R
R
require(ProjectTemplate)
create.project("AwesomeProject", minimal = T)
q()
# use the shell to start a git repo
cd AwesomeProject
git init
# and commit the TODO
git add TODO
git commit -m "first commit"
# tada!

Emacs can now be used to manage the git repo using the C-x g command
Rstudio has a really nice GUI for doing this inside it;s Project management interface too.

Using Github or another Git Server

You can easily set up a Github repo for this now but it will be public
Alternatative is to set up your own private Git server. I followed these instructions to Running a Simple Git Server Using SSH
Either way once you have set up your remote git repo you need to set the remote tracking

Git Code:

cd /path/to/local/git/repo
git remote add origin git@github-or-other-server:myname/myproject.git
git push origin master

5 Automate back ups AND 6 Avoid distracting gadgets

OMG backups stress me out
ideally I would follow this advice because “when it comes to losing your data the universe tends toward maximum irony. Don’t push it.”
But I don;t fully comply
Instead I generally use Dropbox for basic project management admin stuff
I use github for code projects I am happy to share, I also pay for 10 private repos
I Set up a git server at my workplace for extra projects but this is on a test server that is not backed up, and I am not really happy about this
In terms of Distracting Gadgets, I think that with the current tempo of new innovations related to new software tools for this type of work I should keep trying new things but I have pretty much settled into a comfortable zone with the gadgets I described here.

Conclusions

This is how I’ve worked for a couple of years
I find it very enjoyable, mostly productive but prone to the distractions of “distractions by gadgets”
The main thing I want to point out is the usage of Code Chunks in RRR scripts.
These things are awesome.

Posted in research methods