- The extent to which scientists write metadata is probably lower than it ought to be
- The level of metadata written during science projects is probably described generally as ‘bare-minimum’ and “the minimum needed for one-self to come back to and understand what one did”
- It sometimes seems that even the bare minimum for one-self is not being kept very often
- I argue that the reasons for less-than-adequate metadata can be understood by looking at
- 1) the culture of the scienctists displinary background via training
- 2) the tools available and
- 3) institutional requirements to produce metadata (both about data or access to data)
- In my ongoing series of blog posts I am exploring the tools available.
- In this post I just wanted to start the discussion about discipline culture and institutional requirements.
Discipline Culture
- I trained in Geography in the age of GIS and this community uses metadata a lot
- Due to the prevalance of the digital map (collection of layers) which is a derivative data output
- Need to know the source of all the layers
- first law of GIS is “garbage in, garbage out”
- I was trained in the ANSLIC standard from the start
- ArcGIS has a tool called ArcCatalog which makes metadata easy to create and view
Institutional Requirements
- The ARC and NHMRC say they are going to require more metadata (and even data deposit)
- Restrictions on data access make it necessary to describe at least the metadata around provision agreements, licence, allowable access
- A supporting management level who value the metadata as research output (alongside a peer reviewed paper metadata pales in comparison)
- My old boss used to say “Work Not Published Is Work Not Done”.
This reminds me of Approaches and Barriers to Reproducible Research
- In 2011 BiostatMatt (Matt Shotwell) published a survey of biostatisticians VUMC Dept. of Biostatistics to assess:
- the prevalence of fully scripted data analyses
- the prevalence of literate programming practices
To assess the perceived barriers to reproducible research the also asked:
What The biggest obstacle to always reproducibly scripting your work?
| Barrier | Staff | Faculty |
|----------------------------------------------------------+-------+---------|
| No signifcant obstacles. | 8 | 10 |
| I havent learned how. | 0 | 0 |
| It takes more time. | 7 | 7 |
| It makes collaboration difficult (eg. file compatibility)| 4 | 2 |
| The software I use doesnt facilitate reproducibility. | 0 | 0 |
| Its not always necessary for my work to be reproducible. | 2 | 0 |
| Other | 2 | 1 |
|----------------------------------------------------------+-------+---------|
So what about the Approaches and Barriers to Me Writing Metadata?
With a sample size of one I asked myself these questions:
| Q | A |
|----------------------------------------------------+----------------------------------------------------------------------|
| Do I fully document data (to a metadata standard?) | Occasionally, using DDI for high value raw inputs and final products |
| Do I employ data documentation practices | I use a tool I created to write minimal metadata occasionally |
| What are the main barriers? | takes more time, The software doesnt facilitate, not always necessary|
Conclusions
- The tools need to help write metadata
- the Institution needs to require metadata
References
- Shotwell, M.S. and Alvarez, J.M. 2011. Approaches and Barriers to Reproducible Practices in Biostatistics. http://biostatmatt.com/uploads/shotwell-interface-2011.pdf