Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

The perfect is the enemy of the good

According to Wikipedia https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good this phrase was popularised by Voltaire, and Shakespeare via King Lear: “striving to better, oft we mar what’s well”

This phrase is a favourite of several people I respect and admire. On the other hand it has always vaguely troubled me. I think there may be two dimensions of this phrase that are worth pondering.

One: “the good” is a quality of the work

The first dimension is better expressed in King Lear, and on the wiki page where it is suggested the meaning is “that we might never complete a task if we have decided not to stop until it is perfect.” This is correct, however not an absolutely faithful interpretation of the phrase “striving to better”. To my mind this does not necessarily lead to the position “will not stop until perfect”. If one strives for perfection but admits that this is beyond the scope of one’s ambition then one will eventually stop when “it” is “good enough”, and that state will be better if one were striving for perfection than if one where aiming for “satisfactory”.

Two: “the good” is the impact the work may have on the world around us

The second dimension resonates more clearly with me and that is that “the good” is a public good, an impact that we feel would make the world a better place. To aim for a satisfactory action or intervention on the world around is would then be the goal, and we should try to achieve this as soon as possible, without delaying our work with minutia or trivia that might be that extra 1 percent that we would hope elevates our work further than satisfactory. One extra rung up the ladder toward perfection.

Stirzaker’s “simplicity cycle”

On page 175 of Stirzaker, R. (2010). Out of the scientists garden: A story of water and food. Canberra, Australia: CSIRO. I copied the image and show it below, but I have flipped it around so the axis are reversed from his original. This shows a couple of trajectories we can head along in terms of striving for enhancements, and the effect this has on how “good” the results are.

/images/stirzacker.jpg

Seriousness aside

I used this image in my last project as a rallying call to the team to focus on both striving for the perfection while also balancing our aims to impact on the good. My colleagues pointed out that the flipped over version is easily turned into a stickman, a la XKCD!

/images/stirzacker2.jpg

Posted in  disentangle writing


UPDATE to post 'GIS Issues when R is Used for Transforming Coordinate Systems'

infile <- "western_sydney_passive_samplers_site_details.csv"
indir <- "data_derived"

outdir <- indir
outfile <- file.path(outdir, gsub(".csv",".shp",infile))

dat <- read.csv(file.path(indir, infile), as.is=T)
str(dat)
  
dat2 <- SpatialPointsDataFrame(data.frame(x=dat$long, y=dat$lat), dat,
                                proj4string=CRS(epsg$prj4[epsg$code %in% '4283'])
                                )

writeOGR(dat2, outfile,
         outdir, driver='ESRI Shapefile'
         )
# fix the prj file 
download.file("http://spatialreference.org/ref/epsg/4283/prj/",
              gsub(".shp", ".prj", outfile), mode = "wb"
              )

/images/samplers_sydney.png

Posted in  Data Operation spatial


Release checklist references

Some good reading and best-practice advice

Alternative text - include a link to the PDF!

/images/Huff2016.png

  • Stanisic, L., Legrand, A., & Danjean, V. (2015). An Effective Git And Org-Mode Based Workflow For Reproducible Research. ACM SIGOPS Operating Systems …. Retrieved from http://dl.acm.org/citation.cfm?id=2723881

/images/Stanisic2015.png

Posted in  disentangle


Repeatability vs replication - Definitional variations

In my previous post I bemoaned the variability in definitions of reproducibility and replication.
The definition of reproducibility in particular concerns me as I finalise my thesis on ‘Reproducible Research Pipelines’. However I also have noticed a confusion of Repeatability and Replication that is worth noting. In various author’s definitions. In CASSEY, P., & BLACKBURN, T. M. (2006). Reproducibility and Repeatability in Ecology. BioScience, 56(12), 958. http://bioscience.oxfordjournals.org/content/56/12/958.full they agree with the definition of Peng 2011 that Reproducibility is to re-calculate the same result from the same data and they use Repeatability as a synonym for Peng’s Replication.

According to Cassey and Blackburn (2006), reproducibility is the case that:

from the information presented in the study, a third party could
replicate (sic) the reported results identically.

This definition distinguishes reproducibility from repeatability which is when

a third party must be able to perform a study using identical methodological protocols 
and analyze the resulting data in an identical manner

Which is what Peng terms ‘replicability’. This leads me to conclude that:

Cassey and Blackburn conflate Repeatability with Replicability.

However another author gives a very confused and overlapping view Ellison, A. (2010). Repeatability and Transparency in Ecological Research. Ecology. https://dash.harvard.edu/bitstream/handle/1/3123279/Ellison_Repeatability.pdf?sequence=2 Accessed 12 Jan 2016.

To add further confusion I have dug up the latest dictionary of epidemiology and find that whilst this agrees with Reproducibility and Replication, it treates Repeatability as a synonym of Reproducibility!

  • Porta, Miquel S. Dictionary of Epidemiology (6th Edition). New York, NY, USA: Oxford University Press, USA, 2014. ProQuest ebrary. Web. 24 January 2016. Copyright © 2014. Oxford University Press, USA. All rights reserved.

Repeatability (Syn: reproducibility): The value below which the
absolute difference between two single test results may be expected to
lie with a probability of 95%, when the results are obtained by the
same method and equipment from identical test material in the same
setting by the same operator within short intervals of time. A test or
measurement is repeatable if the results are identical or closely
similar each time it is conducted. 1-3,5-9,91 See also measurement,
terminology of; reliability.

Replication: The execution of an observational or experimental study
more than once so as to confirm the findings, increase precision, and
obtain a closer estimation of sampling error. Exact replication should
be distinguished from consistency of results on replication. Exact
replication is often possible in the physical sciences, but in the
health, life, and social sciences consistency of results on
replication is often the best that can be
attained. 1,2,6,25,39,42,91,206-208,270,273,533 Consistency of results
on replication is perhaps the most important consideration in
judgements of CAUSALITY.

Reproducibility: see REPEATABILITY.

Whatever happens, never cite Drummond 2009!

It is unclear whether Drummond’s self-published conference paper was peer reviewed (or reviewed by the conference committee) but the following quote is unsupported assertion and should be ignored.

Reproducibility requires changes; replicability avoids them.

Drummond, C. (2009). Replicability is not Reproducibility: Nor is it Good Science. Proceedings of the Evaluation Methods for Machine Learning Workshop 26th International Conference for Machine Learning. Retrieved January 25, 2016, from http://cogprints.org/7691/7/icmlws09.pdf

Indeed in 2012 Drummond (in another self-published, working paper without evidence of peer review) did a backflip and said:

Reproducibility requires that the experiment originally carried out be duplicated 
as far as is reasonably possible. The aim is to minimize the difference from 
the first experiment including its flaws, to produce
independent verification of the result as reported

Drummond, C. (2012). Reproducible research: A dissenting opinion. Unpublished draft. Retrieved October 9, 2015, from http://cogprints.org/8675/

Posted in  disentangle


Reproducibility vs replication - Definitional variations

There is confusion between the definitions of reproducibility, repeatability and replicability. I strongly feel we need to tackle that head on and come to an agreed definition. I prefer Peng 2011:

  • Reproduciblity is using the same data and getting the exact same result.
  • Replication is getting a new sample and doing the analysis again and getting a similar result.
Peng, R. D. (2011). Reproducible research in computational
science. Science, 334(6060), 1226–1227. doi:10.1126/science.1213847

There are many people using these interchangeably or around the opposite way. For example Drummond got it round the wrong way in ‘Drummond, C., 2009. Replicability is not reproducibility: nor is it good science’ http://cogprints.org/7691/7/icmlws09.pdf and then reverted it in ‘Reproducible Research: a Dissenting Opinion’. http://cogprints.org/8675/1/ReproducibleResearch.pdf (Check out Peng’s reaction: http://simplystatistics.org/2012/11/15/reproducible-research-with-us-or-against-us-3/)

And this blog but also gets the definitions around the wrong way http://jermdemo.blogspot.com.au/2012/12/the-reproducible-research-guilt-trip.html (even tho being quite entertaining to read and had this great picture…. not sure what the picture means???)

/images/thinker2.jpg

Sure, OK, it is fine that people define things differently to one another but:

The single biggest problem in communication 
is the illusion that it has taken place.
George Bernard Shaw quotes from BrainyQuote.com

We rely on a common defintion to ensure we are talking about the same thing.

/images/communication-bnewell.png

Source: Newell, B. (2012). Simple models, powerful ideas: Towards effective integrative practice. Global Environmental Change, 22(3), 776–783. http://dx.doi.org/10.1016/j.gloenvcha.2012.03.006

It is regrettable that in Ecology (my favourite discipline) there seems to be quite a wide gap between various author’s definitions. In CASSEY, P., & BLACKBURN, T. M. (2006). Reproducibility and Repeatability in Ecology. BioScience, 56(12), 958. http://bioscience.oxfordjournals.org/content/56/12/958.full they agree with the definition of Peng 2011. However another author gives a very confused and overlapping view:


because that context changes through time and space, it is virtually
impossible to reproduce precisely or quantitatively any single
experimental or observational field study in ecology. Yet many
ecological studies can be repeated. In particular, ecological
synthesis – the assembly of derived datasets and their subsequent
analysis, re-analysis, and meta-analysis – should be easy to repeat
and reproduce

Ellison, A. (2010). Repeatability and Transparency in Ecological Research. Ecology. https://dash.harvard.edu/bitstream/handle/1/3123279/Ellison_Repeatability.pdf?sequence=2 Accessed 12 Jan 16

In another interesting approach Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The Economics of Reproducibility in Preclinical Research. PLOS Biology, 13(6), e1002165. http://dx.doi.org/10.1371/journal.pbio.1002165 chose instead to define irreproducibility such that it:

that encompasses the existence and propagation of one or more errors,
flaws, inadequacies, or omissions (collectively referred to as errors) 
that prevent replication of results

Leaving us to assume that the opposite of this is therefore reproducibility, although avoiding defining this themselves. Looking back at the two heads in the picture above… it is interesting to ponder how some people would receive the signal of Freedman et al, having defined the opposite of the thing that is the object of their discussion, rather than the thing itself!

Let’s all agree with Peng and Cassey/Blackburn and move on already!

Posted in  disentangle