Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

Coming To Grips With Citations In Reproducible Research Reports

Introduction

Earlier this year I was pleased to stumble on to Petr Keil's Simple template for scientific manuscripts in Rmarkdown and the Github Repo https://github.com/petrkeil/Blog/tree/master/2015_03_12_R_ms_template.

I was already using Rmarkdown effectively for everything I wanted except my bibliography, and this helped a lot. But I eventually found I needed to tweak the format of the citation style. I tried out a bunch of other CSL files but none felt just right. I tried out these after downloading from https://github.com/citation-style-language/styles:

american-physiological-society.csl
annals-of-the-association-of-american-geographers.csl
biomed-central.csl
ecology.csl
pnas.csl

SO I hacked the CSL file

NB also that the csl file in petrkeil’s repo is an older version from 2012 of the version in the citation-style repo called methods-in-ecology-and-evolution. The differences are not large though.

1.1.1 Example 1:

The journal article by Michener et al. (1997) and another one (Bodnar et al. 2004) appear with their full URL even though I just want their DOI.

1.1.2 Example 2:

Some recent papers (Open Science Collaboration 2015; Aiken et al. 2015; Davey et al. 2015) don’t have Volume info and I want to say [epub ahead of print].

1.1.3 Example 3:

This blog post on ‘evidence based data analysis pipeline’ by Peng (2013) is one that definitely needs the URL and date accessed.

References

Aiken, A.M., Davey, C., Hargreaves, J.R. & Hayes, R.J. (2015). Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a pure replication. International Journal of Epidemiology, dyv127. Retrieved from http://ije.oxfordjournals.org/content/early/2015/07/21/ije.dyv127.full http://www.ije.oxfordjournals.org/lookup/doi/10.1093/ije/dyv127

Bodnar, A., Castorina, R., Desai, M., Duramad, P., Fischer, S., Klepeis, N., Liang, S., Mehta, S., Naumoff, K., Noth, E.M., Schei, M., Tian, L., Vork, K.L. & Smith, K.R. (2004). Lessons learned from ‘the skeptical environmentalist’: an environmental health perspective. International journal of hygiene and environmental health, 207, 57–67. Retrieved from http://www.sciencedirect.com/science/article/pii/S1438463904702643

Davey, C., Aiken, A.M., Hayes, R.J. & Hargreaves, J.R. (2015). Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a statistical replication of a cluster quasi-randomized stepped-wedge trial. International Journal of Epidemiology, dyv128. Retrieved from http://ije.oxfordjournals.org/content/early/2015/07/21/ije.dyv128.full http://www.ije.oxfordjournals.org/lookup/doi/10.1093/ije/dyv128

Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. & Stafford, S.G. (1997). Nongeospatial metadata for the ecological sciences. Ecological Applications, 7, 330–342. Retrieved from http://www.scopus.com/inward/record.url?scp=0030616825\&partnerID=8YFLogxK

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716–aac4716. Retrieved from http://www.sciencemag.org/cgi/doi/10.1126/science.aac4716

Peng, R.D. (2013). Implementing Evidence-based Data Analysis: Treading a New Path for Reproducible Research. Simply statistics. Retrieved July 26, 2015, from http://simplystatistics.org/2013/09/05/implementing-evidence-based-data-analysis-treading-a-new-path-for-reproducible-research-part-3/

# first replace the <macro name="access">
# with 
<macro name="access">
  <choose>
    <if variable="DOI"/>
    <!--don't use if there is a DOI-->
    <else>
      <choose>
        <if variable="URL">
          <group delimiter=" " prefix=" ">
            <group>
              <text variable="URL"/>
            </group>
            <group prefix="[" suffix="]" delimiter=" ">
              <date variable="accessed">
                <date-part name="day"/>
                <date-part name="month" prefix=" " suffix=" " form="short"/>
                <date-part name="year"/>
              </date>
            </group>
          </group>
        </if>
      </choose>
    </else>
  </choose>
</macro>
#### Then add this ####
<macro name="date">
  <choose>
    <if variable="issued">
      <choose>
        <if type="article-journal">
          <date variable="issued">
            <date-part name="year"/>
          </date>
        </if>
        <else>
          <date variable="issued">
            <date-part name="year"/>
          </date>
        </else>
      </choose>
    </if>
    <else>
      <text term="no date" prefix="[" suffix="]"/>
    </else>
  </choose>
</macro>
#### And add this ####
      <else-if type="article-journal">
        <choose>
          <if variable="issue volume" match="any">
            <text macro="title" suffix=" "/>
            <text variable="container-title" suffix=" " form="short" font-style="italic" strip-periods="true"/>
            <text variable="volume"/>
            <text variable="page" prefix=": "/>
            <text macro="date" prefix=", " suffix="."/>
          </if>
          <else>
            <choose>
              <if variable="DOI">
                <text macro="title" suffix=" "/>
                <text variable="container-title" suffix=" " form="short" font-style="italic"/>
                <group prefix="(" suffix=").">
                  <date variable="issued">
                    <date-part name="month" prefix=" " suffix=" "/>
                    <date-part name="day" suffix=", "/>
                    <date-part name="year"/>
                  </date>
                </group>
                <text variable="DOI" prefix=" doi: "/>
              </if>
              <else>
                <text variable="container-title" suffix=". " form="short" font-style="italic"/>
              </else>
            </choose>
          </else>
        </choose>
      </else-if>

Now use this modified csl in the header of the RMD file instead of mee.csl

1.2 NEW References

Aiken, A.M., Davey, C., Hargreaves, J.R. & Hayes, R.J. (2015).Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a pure replication International Journal of Epidemiology (epub ahead of print July 2015). doi: 10.1093/ije/dyv127

Bodnar, A., Castorina, R., Desai, M., Duramad, P., Fischer, S., Klepeis, N., Liang, S., Mehta, S., Naumoff, K., Noth, E.M., Schei, M., Tian, L., Vork, K.L. & Smith, K.R. (2004).Lessons learned from ‘the skeptical environmentalist’: an environmental health perspective. International journal of hygiene and environmental health 207: 57–67, 2004.

Davey, C., Aiken, A.M., Hayes, R.J. & Hargreaves, J.R. (2015).Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a statistical replication of a cluster quasi-randomized stepped-wedge trial International Journal of Epidemiology (epub ahead of print July 2015). doi: 10.1093/ije/dyv128

Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. & Stafford, S.G. (1997).Nongeospatial metadata for the ecological sciences Ecological Applications 7: 330–342, 1997. http://www.scopus.com/inward/record.url?scp=0030616825\&partnerID=8YFLogxK

Open Science Collaboration. (2015).Estimating the reproducibility of psychological science Science 349: aac4716–aac4716, 2015.

Peng, R.D. (2013). Implementing Evidence-based Data Analysis: Treading a New Path for Reproducible Research. Simply statistics. http://simplystatistics.org/2013/09/05/implementing-evidence-based-data-analysis-treading-a-new-path-for-reproducible-research-part-3/ [26 Jul. 2015]

Posted in  disentangle


A quick review of a quick guide to organizing computational biology projects

The organisation of material is a particularly vexatious topic. For a data analysis project it is very important that the set of folders and files is logical and intuitive, as well as being well documented. The oft-heard exhortation by computer scientists to their users to ‘Read The F-ing Manual’ (RTFM) is perennial and rooted in the fundamental difficulty of readers to have the time required to read and digest the detailed information there-in.

I missed out so I thought I’d put my notes up here for reference:

  1. core guiding principle is simple: Someone unfamiliar with your project should be able to look at your computer files and understand in detail what you did and why
  2. your future self may find it difficult to understand your current work.
  3. Noble’s law: ‘Everything you do, you will probably have to do over again’
  4. store all of the files relevant to one project in common root directory
  5. The exception to this rule is data/code that are used in multiple projects, they are standalone projects
  6. Within a given project, use a top-level organization that is logical first, then chronological at the next level, and then logical organization next
  7. Core folders are data, results, doc (versus Berndt Weiss’ dat, ana, doc)
  8. Chronological order? ‘tempting to apply a similar, logical organization… this approach is risky, because the logical structure of your final set of experiments may look drastically different from the form you initially designed. This is particularly true under the results directory, where you may not even know in advance what kinds of experiments you will need to perform’
/projectname (eg msms)/
    /doc/
        /ms-analysis.html 
        /paper/
            /msms.tex
            /msms.pdf
    /data/
        /YYYY-MM-DD/
            /yeast/
                /README
                /yeast.sqt
            /worm/
                /README
                /worm.sqt
    /src/
        /ms-analysis.c
    /bin/
        /parse-sqt.py
    /results/
        /notebook.html 
        /YYYY-MM-DD-1/
            /runall
            /split1/
            /split2/
        /YYYY-MM-DD-2/
            /runall

  1. Use a driver script to automate creation of a directory structure
  2. maintain a chronologically organized lab notebook (I have been calling this a work ‘log’ sensu Scott Long’s 2008 ‘Workflow book’)
  3. create either a README file, or a command line driver script (he calls this runall, but see also main.R sensu the Reichian LCFD model)
  4. you should end up with a file that is parallel to the lab notebook entry. The lab notebook contains a prose description of the exper- iment, whereas the driver script contains all the gory details
  5. Version Control. ‘Nuff said! But how to build capacity with Github when all my colleagues seem so confused by it?

Posted in  disentangle


Naming Conventions For Computer Files

I’m working on a new Data Management Plan for a research group who merge data from air pollution, meteorology, population census and health outcome datasets. The folder organisation is pretty much under control now, but the file names are challenging.

I’m searching for inspiration and was recommended this conversation: https://github.com/minisciencegirl/studyGroup/issues/20

File Organization and Naming:
...
I want to include details of what the settings were or what dataset I
started out with. Rather than saving a file name a mile long
"FlightHomLog2_av1_Euc_ArrayEucl_AvgLink", there must be many
different better ways to organizing my file space.

There is a bunch of good advice already here, and I recommend the slides https://github.com/Reproducible-Science-Curriculum/rr-organization1/tree/master/slides/naming-slides and the two PLOS articles, but I wanted to pull out the two things I think it is important to get right:

  • ordering in lists
  • substring chunks that can be extracted

I think that the substring chunks are explained well in the slides link above (summary, use ‘_’ or ‘-‘ to split the string), but I think that the ordering problem needs some thinking.

Tidy Data:

This all reminds me of words from Hadley Wickham about tidy data, and the order that columns should be arranged in tabular data. The principles are similar I think.

A good ordering makes it easier to scan the raw values. One way of
organizing variables is by their role in the analysis: are values
fixed by the design of the data collection, or are they measured
during the course of the experiment? Fixed variables describe the
experimental design and are known in advance. Computer scientists
often call fixed variables dimensions, and statisticians usually
denote them with subscripts on random variables. Measured variables
are what we actually measure in the study. Fixed variables should come
first, followed by measured variables, each ordered so that related
variables are contiguous. Rows can then be ordered by the first
variable, breaking ties with the second and subsequent (fixed)
variables. 

Wickham, H. (2014). Tidy Data. JSS Journal of Statistical Software, 59(10). Retrieved from http://www.jstatsoft.org/

One way that we did this:

Colleagues and I came up with the following protocol for an ecology and biodiversity database

  1. Project name (optional sub-project name)
  2. Data type (such as experimental unit, observational unit, and/or measurement methods)
  3. Geographic location (State, Country)
  4. Temporal frequency and coverge Annual or seasonal tranches

Tidy data generalisable concepts are dimensions and variables

The concept of dimensions and variables can be useful here, and especially for deciding on filenames. Dimensions are fixed or change slowly while variables change more quickly . For example the project name is ‘fixed’, that is it does not change across the files, but the sub-project name does change, just more slowly (say there may be 2-3 different sub-projects within a project). Then there may be a set of data types, and these ‘change’ more quickly than the sub-project name (by change I mean, there are more of them). Then the geographic and temporal variables might change quickest of all.

So a general rule for the order of things can be stated The more fixed variables should come first (those things that don’t change, or don’t change much), followed by the more fluid variables (or things that change more across the project). List elements can then be ordered so that the groups of things that are similar will always be contiguous, and vary sequentially within clusters.

Perhaps an example would be easier to understand. Here is a set of file names that we constructed for one of our ecological field sites (project) and plots (sub-project or measurement location):

Notice we also had a controlled vocabulary of data types and their acronyms before starting this

| Filename                                                            | Title                                                                                                                                 |
|---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------|
| asn_fnqr_soil_charact_robson_2011.csv                               | Soil Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2011                                                              |
| asn_fnqr_soil_pit_robson_2012.csv                                   | Soil Pit Data, Water Content and Temperature, Far North Queensland Rainforest SuperSite, Robson Creek, 2012                           |
| asn_fnqr_veg_seedling_robson_2010-2012.csv                          | Seedling Survey,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2012                                                  |
| asn_fnqr_veg_seedling_transect_coord_robson_2010-2012.csv           | Seedling Survey,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2012                                                  |
| asn_fnqr_core_1ha_robson_2014.csv                                   | Soil Pit Data, Soil Characterisation, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha plot, 2014                   |
| asn_fnqr_fauna_biodiversity_ctbcc_2012.csv                          | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, 2012                                      |
| asn_fnqr_fauna_biodiversity_ctbcc_2013.csv                          | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, 2013                                      |
| asn_fnqr_fauna_biodiversity_ctbcc_capetrib_2014.csv                 | Avifauna Monitoring, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2014                                                |
| asn_fnqr_fauna_biodiversity_ctbcc_lu11a_2014.csv                    | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU11A, 2014                               |
| asn_fnqr_fauna_biodiversity_ctbcc_lu7a_2014.csv                     | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU7A, 2014                                |
| asn_fnqr_fauna_biodiversity_ctbcc_lu7b_2014.csv                     | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU7B, 2014                                |
| asn_fnqr_fauna_biodiversity_ctbcc_lu9a_2014.csv                     | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU9A, 2014                                |
| asn_fnqr_fauna_biodiversity_ctbcc-lu11a_2009-2011.csv               | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU11A, 2009-2011                          |
| asn_fnqr_fauna_biodiversity_ctbcc-lu7a_2009-2011.csv                | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU7A, 2009-2011                           |
| asn_fnqr_fauna_biodiversity_ctbcc-lu9a_2009-2011.csv                | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU9A, 2009-2011                           |
| asn_fnqr_fauna_biodiversity_habitat codes_ctbcc-lu11a_2009-2011.pdf | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU11A, 2009-2011                          |
| asn_fnqr_fauna_biodiversity_habitat codes_ctbcc-lu9a_2009-2011.pdf  | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU9A, 2009-2011                           |
| asn_fnqr_fauna_biodiversity_habitat_codes_ctbcc-lu7a_2009-2011.pdf  | Vertebrate Fauna Biodiversity Monitoring, Far North Queensland Rainforest SuperSite, CTBCC, LU7A, 2009-2011                           |
| asn_fnqr_fauna_birds_capture_robson_2011-2014.csv                   | Bird Capture Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2011-2014                                                 |
| asn_fnqr_fauna_birds_robson_2010-2014.csv                           | Bird Survey Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2014                                                  |
| asn_fnqr_fauna_invert_moth_robson_2009.csv                          | Moth Inventory at Canopy and Ground Level, Far North Queensland Rainforest SuperSite, Robson Creek, 2009                              |
| asn_fnqr_fauna_invert_moth_robson_2010.csv                          | Moth Inventory at Canopy and Ground Level, Far North Queensland Rainforest SuperSite, Robson Creek, 2010                              |
| asn_fnqr_fauna_invert_moth_robson_2011.csv                          | Moth Inventory at Canopy and Ground Level, Far North Queensland Rainforest SuperSite, Robson Creek, 2011                              |
| asn_fnqr_fauna_invert_robson_25ha_2013                              | Invertebrate Fauna Survey, Far North Queensland Rainforest SuperSite, Robson Creek, 25 Ha Plot, 2013                                  |
| asn_fnqr_geo_tracks_100m_grid_robson_2010-2013.kml                  | Base Geographical Data,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2013                                           |
| asn_fnqr_geo_tracks_robson_2010-2013.kml                            | Base Geographical Data,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2013                                           |
| asn_fnqr_geo_tracks_robson_2010-2013.mdb                            | Base Geographical Data,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2013                                           |
| asn_fnqr_geo_tracks_trees_robson_2010-2013.kml                      | Base Geographical Data,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2013                                           |
| asn_fnqr_soil_cosmos_robson_2011.csv                                | Soil Sampling for Calibration of Cosmic Ray Soil Moisture Sensor, Far North Queensland Rainforest SuperSite, Robson Creek, 2011       |
| asn_fnqr_soil_pit_robson_2012.csv                                   | Soil Pit Data, Soil Characterisation, Far North Queensland Rainforest SuperSite, Robson Creek, 2012                                   |
| asn_fnqr_soil_pit_robson_2013.csv                                   | Soil Pit Data, Water Content and Temperature, Far North Queensland Rainforest SuperSite, Robson Creek, 2013                           |
| asn_fnqr_soil_properties_ddc_2013.csv                               | Soil Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2013                                                 |
| asn_fnqr_soil_properties_ddc_2014.csv                               | Soil Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2014                                                 |
| asn_fnqr_soil_properties_robson_2014.csv                            | Soil Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2014                                                              |
| asn_fnqr_stream_chem_robson_201310.csv                              | Water Chemistry Data, Far North Queensland Rainforest SuperSite, Robson Creek, 201310-201311                                          |
| asn_fnqr_stream_chem_robson_201310-201405.csv                       | Water Chemistry Data, Far North Queensland Rainforest SuperSite, Robson Creek, 201310-201405                                          |
| asn_fnqr_stream_chem_robson_201311.csv                              | Water Chemistry Data, Far North Queensland Rainforest SuperSite, Robson Creek, 201310-201311                                          |
| asn_fnqr_stream_chem_std_methods_robson_2013.pdf                    | Water Chemistry Data, Far North Queensland Rainforest SuperSite, Robson Creek, 201310-201311                                          |
| asn_fnqr_stream_phys-chem_diagram_robson_2013.pdf                   | Stream Physico-Chemical Data,  Far North Queensland Rainforest SuperSite, Robson Creek, 201304-201305                                 |
| asn_fnqr_stream-phys-chem_robson_201304-201305.csv                  | Stream Physico-Chemical Data,  Far North Queensland Rainforest SuperSite, Robson Creek, 201304-201305                                 |
| asn_fnqr_veg_cwd_robson_core_1ha_2012.csv                           | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_dbh-h_capetrib_crane_plot_2001.csv                     | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 1 ha Crane Plot, 2001                               |
| asn_fnqr_veg_dbh-h_capetrib_crane_plot_2005.csv                     | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 1 ha Crane Plot, 2005                               |
| asn_fnqr_veg_dbh-h_capetrib_crane_plot_2010.csv                     | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 1 ha Crane Plot, 2010                               |
| asn_fnqr_veg_dbh-h_robson_25ha_2009-2015.csv                        | Vascular Plant Data  10 cm DBH,  Far North Queensland Rainforest SuperSite, Robson Creek, 25 ha Plot, 2009-2015                    |
| asn_fnqr_veg_fruit_robson_25ha_2011-2015.csv                        | Fruit Phenology, Far North Queensland Rainforest SuperSite, Robson Creek, 25 ha Plot, 2011-2015                                       |
| asn_fnqr_veg_gentry_mid-stratum_robson_core_1ha_2012.csv            | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_gentry_sub-stratum_herbs_robson_core_1ha_2012.csv      | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_gentry_sub-stratum_shrubs_robson_core_1ha_2012.csv     | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_lai_robson_core_1ha_2012.pdf                           | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_leaf_phys_capetrib_20120831.csv                        | Leaf Level Physiology, Chemistry and Structural Traits, Far North Queensland Rainforest SuperSite, Cape Tribulation, Crane Site, 2012 |
| asn_fnqr_veg_leaf_phys_master_aci_curves_robson_2012.csv            | Leaf Level Physiology, Chemistry and Structural Traits, Far North Queensland SuperSite, Robson Creek, 2012                            |
|                                                                     |                                                                                                                                       |
| asn_fnqr_veg_leaf_phys_master_ai_curves_robson_2012.csv             | Leaf Level Physiology, Chemistry and Structural Traits, Far North Queensland SuperSite, Robson Creek, 2012                            |
|                                                                     |                                                                                                                                       |
| asn_fnqr_veg_seedling_species_robson_2010-2012.csv                  | Seedling Survey,  Far North Queensland Rainforest SuperSite, Robson Creek, 2010-2012                                                  |
| asn_fnqr_veg_species_capetrib_crane_plot_2001.csv                   | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 1 ha Crane Plot, 2001                               |
| asn_fnqr_veg_species_capetrib_crane_plot_2005.csv                   | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 1 ha Crane Plot, 2005                               |
| asn_fnqr_veg_species_capetrib_crane_plot_2010.csv                   | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 1 ha Crane Plot, 2010                               |
| asn_fnqr_veg_species_robson_2012.csv                                | Vegetation Species List, Far North Queensland Rainforest SuperSite, Robson Creek, 2012                                                |
| asn_fnqr_veg_struct_robson_core_1ha_2012.csv                        | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_centre_images_2012.zip          | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_centre_images_2014.zip          | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_ne_corner_images_2012.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_ne_corner_images_2014.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_nw_corner_images_2012.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_nw_corner_images_2014.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_se_corner_images_2012.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_se_corner_images_2014.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_sw_corner_images_2012.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_struct_robson_core_1ha_sw_corner_images_2014.zip       | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_structure_robson_core_1 ha.csv                         | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_veg_vascular plant list_robson_core_1 ha_2012.csv          | Vascular Plant Data, Far North Queensland Rainforest SuperSite, Robson Creek, Core 1 ha, 2012                                         |
| asn_fnqr_water_properties_robson_2013.csv                           | Stream Physico-Chemical Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2013                                           |
| asn_fnqr_water_properties_robson_2014.csv                           | Stream Physico-Chemical Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2014                                           |
| asn_fnqr_weather_capetrib_2006.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2006                                               |
| asn_fnqr_weather_capetrib_2007.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2007                                               |
| asn_fnqr_weather_capetrib_2008.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2008                                               |
| asn_fnqr_weather_capetrib_2009.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2009                                               |
| asn_fnqr_weather_capetrib_2010.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2010                                               |
| asn_fnqr_weather_capetrib_2011.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2011                                               |
| asn_fnqr_weather_capetrib_2012.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2012                                               |
| asn_fnqr_weather_capetrib_2013.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2013                                               |
| asn_fnqr_weather_capetrib_2014.csv                                  | Weather Station Data, Far North Queensland Rainforest SuperSite, Cape Tribulation, 2014                                               |
| asn_fnqr_weather_ddc_2008.csv                                       | Weather Station Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2008                                      |
| asn_fnqr_weather_ddc_2009.csv                                       | Weather Station Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2009                                      |
| asn_fnqr_weather_ddc_2010.csv                                       | Weather Station Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2010                                      |
| asn_fnqr_weather_ddc_2011.csv                                       | Weather Station Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2011                                      |
| asn_fnqr_weather_ddc_2012.csv                                       | Weather Station Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2012                                      |
| asn_fnqr_weather_ddc_2013.csv                                       | Weather Station Data, Far North Queensland Rainforest SuperSite, Daintree Discovery Centre, 2013                                      |
| asn_fnqr_weather_robson_2010.csv                                    | Weather Station Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2010                                                   |
| asn_fnqr_weather_robson_2011.csv                                    | Weather Station Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2011                                                   |
| asn_fnqr_weather_robson_2012.csv                                    | Weather Station Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2012                                                   |
| asn_fnqr_weather_robson_2013.csv                                    | Weather Station Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2013                                                   |
| asn_fnqr_weather_robson_2014.csv                                    | Weather Station Data, Far North Queensland Rainforest SuperSite, Robson Creek, 2014                                                   |

Posted in  disentangle


Open Notebook Blogging Vs Twitter? Or Can I Do Both?

  • I was tooling around someone’s blog and noticed a link to this interesting talk “Citation & Productivity Benefits from Open Science” by https://github.com/BillMills
  • the slides markdown is here https://github.com/BillMills/practicalOpenScience/blob/gh-pages/outline.md
  • the slide on “Open Communication” interested me, as I have turned up my open notebook blog to 11 recently with a couple of posts a week for the last few weeks
  • the recommendation to “Blog Early And Blog Often” resonated here!
  • I was intrigued by the comments
  
used well, twitter can be a useful tool for frequent communication
paradigms as a distributor and aggregator of links to the content in
the other three bullet points. It can be really tough to stay on top
of everyone's blog, everyone's issue tracker and everything else; by
pushing links to our followers every time we have a new RFC out and
vice versa, we greatly simplify this process. See this example.
  

/images/twits.png

Warning, Danger

Stepping Away From Twitter
June 12, 2015
I recently noticed an interesting pattern.
On days when I’m on Twitter, my ability to focus and 
get good work done falls off a cliff.

It’s not just “when you read Twitter first-thing in the day”, 
something I’ve heard people discuss. 
It was that I was on Twitter at all.

But there are clear benefits right?

  • So I had to take a step back from my recent explorations with linking up my social media and blogging interests
  • Looking over what I am trying to achieve by keeping regular posts of my work, along with the need I feel to make sure people who are interested can find my work, I began to despair
  • but then I kept digging around and see a lot of people I admire linking up twitter and more scientific communications

So what?

Posted in  disentangle


Complexity of Graphs Obfuscates, but Visual Grouping Helps Disentangle Things

Using diagrams has long been a technique used in science to describe the relationships (edges) between things (nodes). Mathematics and geometry tools have been applied to ameliorate the problem of laying out the diagram for the most efficient use of space. It is desirable to minimise the gaps between the nodes and also to ensure that lines do not overlap too much. This is because the complexity of graphs obfuscates the details that we are trying to show. Visual grouping helps to disentangle the relationships.

As an example, the relationship between drought and suicide is a complex system where the effects are indirect. The focus is on a chain of intermediary causal factors. These questions are usually explored in the context of many other factors that describe human biological variables and the socio-economic milieu.

In this post I utilise the R package DiagrammeR to construct a causal directed acyclic graph (DAG) of the putative effects of a set of selected causal factors from both natural and social capital theories.

The following code produces graph /images/suicide-drought.png

Or this interactive version /viewhtml21f36d8c5d7d/index.html

library(DiagrammeR)
#### First create the outcome
nodes_outcome <- create_nodes(nodes = c('suicide','depression','anxiety'),
                        label = TRUE,
                        colour = "black")

edges_outcome <- create_edges(from = c('depression','anxiety'),
                        to =   c("suicide", "suicide")
                        )

graph_outcome <- create_graph(nodes_df = nodes_outcome,
                       edges_df = edges_outcome)
# just test this out
## render_graph(graph_outcome)
                        
#### now the social capital factors  
nodes_social <- 
  create_nodes(nodes =  c("stress", "decreased community support", "migration"),
               label = TRUE,
               color = "blue")

edges_social <- create_edges(from =  c("stress", "stress", "decreased community support",
                          "migration"),
                        to =   c("anxiety", "depression", "anxiety",
                          "decreased community support")
                        )
graph_social <- create_graph(nodes_df = nodes_social,
                       edges_df = edges_social)
# render_graph(graph_social)

#### now the financial capital factors
nodes_financial <- 
  create_nodes(nodes = c("employment", "debt"),
               label = TRUE,
               color = "green")

edges_financial <- create_edges(from = c("employment", "employment", "debt"),
                        to =   c("stress", "debt", "stress")
                        )
graph_financial <- create_graph(nodes_df = nodes_financial,
                       edges_df = edges_financial)
# render_graph(graph_financial)


#### now the natural capital factors
nodes_natural <- 
  create_nodes(nodes = c("drought", "declined agricultural productivity", "decreased food security"),
               label = TRUE,
               color = "red")

edges_natural <- create_edges(from = c("drought", "drought", "declined agricultural productivity",
                          "declined agricultural productivity", "declined agricultural productivity",
                          "decreased food security",
                          "drought"),
                        to =   c("declined agricultural productivity", "decreased food security", "decreased food security",
                          "anxiety", "employment",
                          "anxiety",
                          "migration")
                        )
graph_natural <- create_graph(nodes_df = nodes_natural,
                       edges_df = edges_natural)
## render_graph(graph_natural)

# use create_graph on separate nodes and edges data frames, one for each cluster
# then access the dot codes for each
gr0 <- graph_outcome$dot_code
gr1 <- graph_social$dot_code
gr2 <- graph_financial$dot_code
gr3 <- graph_natural$dot_code
# then replace the graph with subgraph
gr0 <- gsub("digraph", "subgraph cluster0", gr0)
gr1 <- gsub("digraph", "subgraph cluster1", gr1)
gr2 <- gsub("digraph", "subgraph cluster2", gr2)
gr3 <- gsub("digraph", "subgraph cluster3", gr3)
# and then combine the subgraphs into one graph
gr_out <- sprintf("digraph{\n%s\n\n %s\n%s\n%s\n}", gr0, gr1, gr2, gr3)
cat(gr_out)
grViz(gr_out)
# If graphviz is installed and on linux call it with a shell command
sink("suicide-drought.dot")
cat(gsub("'",'"', gr_out))
sink()
system("dot -Tpng suicide-drought.dot -o suicide-drought.png")
# Interactive
nodes <- combine_nodes(nodes_outcome, nodes_social, nodes_natural, nodes_financial)
edges <- combine_edges(edges_outcome, edges_social, edges_natural, edges_financial)
  
# Render graph
graph <- create_graph(nodes_df = nodes,
                      edges_df = edges)
  
render_graph(graph, output = "visNetwork")    

Posted in  disentangle