Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

What Do Scientists Who Write Metadata Use To Do It? And Why?

  • The extent to which scientists write metadata is probably lower than it ought to be
  • The level of metadata written during science projects is probably described generally as ‘bare-minimum’ and “the minimum needed for one-self to come back to and understand what one did”
  • It sometimes seems that even the bare minimum for one-self is not being kept very often
  • I argue that the reasons for less-than-adequate metadata can be understood by looking at
  • 1) the culture of the scienctists displinary background via training
  • 2) the tools available and
  • 3) institutional requirements to produce metadata (both about data or access to data)
  • In my ongoing series of blog posts I am exploring the tools available.
  • In this post I just wanted to start the discussion about discipline culture and institutional requirements.

Discipline Culture

  • I trained in Geography in the age of GIS and this community uses metadata a lot
  • Due to the prevalance of the digital map (collection of layers) which is a derivative data output
  • Need to know the source of all the layers
  • first law of GIS is “garbage in, garbage out”
  • I was trained in the ANSLIC standard from the start
  • ArcGIS has a tool called ArcCatalog which makes metadata easy to create and view

Institutional Requirements

  • The ARC and NHMRC say they are going to require more metadata (and even data deposit)
  • Restrictions on data access make it necessary to describe at least the metadata around provision agreements, licence, allowable access
  • A supporting management level who value the metadata as research output (alongside a peer reviewed paper metadata pales in comparison)
  • My old boss used to say “Work Not Published Is Work Not Done”.

This reminds me of Approaches and Barriers to Reproducible Research

  • In 2011 BiostatMatt (Matt Shotwell) published a survey of biostatisticians VUMC Dept. of Biostatistics to assess:
  • the prevalence of fully scripted data analyses
  • the prevalence of literate programming practices

To assess the perceived barriers to reproducible research the also asked:

What The biggest obstacle to always reproducibly scripting your work?

| Barrier                                                  | Staff | Faculty |
|----------------------------------------------------------+-------+---------|
| No signifcant obstacles.                                 |     8 |      10 |
| I havent learned how.                                    |     0 |       0 |
| It takes more time.                                      |     7 |       7 |
| It makes collaboration difficult (eg. file compatibility)|     4 |       2 |
| The software I use doesnt facilitate reproducibility.    |     0 |       0 |
| Its not always necessary for my work to be reproducible. |     2 |       0 |
| Other                                                    |     2 |       1 |
|----------------------------------------------------------+-------+---------|

So what about the Approaches and Barriers to Me Writing Metadata?

With a sample size of one I asked myself these questions:

| Q                                                  | A                                                                    |
|----------------------------------------------------+----------------------------------------------------------------------|
| Do I fully document data (to a metadata standard?) | Occasionally, using DDI for high value raw inputs and final products |
| Do I employ data documentation practices           | I use a tool I created to write minimal metadata occasionally        |
| What are the main barriers?                        | takes more time, The software doesnt facilitate, not always necessary|

Conclusions

  • The tools need to help write metadata
  • the Institution needs to require metadata

References

  • Shotwell, M.S. and Alvarez, J.M. 2011. Approaches and Barriers to Reproducible Practices in Biostatistics. http://biostatmatt.com/uploads/shotwell-interface-2011.pdf

Posted in  Data Documentation


blog comments powered by Disqus