Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

Reproducible Research And Managing Digital Assets Part 2 of 3. makeProject is simple

This post is about an effective and simple data management framework for analysis projects. This post introduces Josh Reich’s LCFD framework, originally introduced in this answer on the stack overflow website here http://stackoverflow.com/a/1434424, and encoded into the makeProject R package http://cran.r-project.org/web/packages/makeProject/makeProject.pdf.

Literature Review Approach

This series of three posts is a summary of some of the most useful advice I have found based on my experience having implemented in my own work.

This is the second post in a series of three entries regarding some evidence-based best practice approaches I have reviewed. I have read many website articles and blog posts on a variety of approaches to the organisation of digital assets in a reporoducible research pipeline. The material I’ve gathered in my ongoing search and opportunistic readings regarding best practice in this area have been recommended by practitioners which provides some weight of evidence. In addition I have implemented some aspects of the many techniques and the reproducibility of my own work has improved greatly.

Digital Assets Management for Reproducible Research

The digital assets in a reproducible research pipeline include:

  1. Publication material (documents, figures, tables, literature)
  2. Data (raw measurements, data provided, data derived)
  3. Code (pre-processing, analysis and presentation)

How to use the makeProject package

Code:

# choose your project dir
setwd("~/projects")   
library(makeProject)
makeProject("makeProjectDemo")
#returns
"Creating Directories ...
Creating Code Files ...
Complete ..."
matrix(dir("makeProjectDemo"))
#[1,] "code"       
#[2,] "data"       
#[3,] "DESCRIPTION"
#[4,] "main.R"     

  • This has set up some simple and sensible tools for a data analysis.
  • Let’s have a look at the main.R script. This is the one file that is used to run all the modules of the project, found in the R scripts in the code folder.

Code:

# Project: makeProjectDemo
# Author: Your Name
# Maintainer: Who to complain to <yourfault@somewhere.net>
 
# This is the main file for the project
# It should do very little except call the other files
 
### Set the working directory
setwd("/home/ivan_hanigan/projects/makeProjectDemo")
 
 
### Set any global variables here
####################
 
 
 
####################
 
 
### Run the code
source("code/load.R")
source("code/clean.R")
source("code/func.R")
source("code/do.R")

I think that is very self-explanatory, but it does need some demonstration. The next instalment in this three part blog post will describe the ProjectTemplate approach. After that I will demonstrate ways that each of the three approaches can be used.

Posted in  Data Management


blog comments powered by Disqus