Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

Using the R EML software to mitigate risks in Morpho and Metacat data publishing

Introduction

  • Over the last few months I have used software called Metacat as a Data Portal and Repository. Metacat is server software which has been developed by the Knowledge Network for Biocomplexity (KNB).
  • Metacat conforms to the Ecological Metadata Language (EML) Standard (https://knb.ecoinformatics.org/#external//emlparser/docs/index.html).
  • KNB also develop another software package called Morpho to be used by Ecologists to document their data (https://knb.ecoinformatics.org/#tools/morpho).
  • Morpho can be used to send the data and metadata documents to be published on a Metacat portal.
  • KNB’s software is used internationally by the Data Observation Network for Earth (DataONE) nodes, the United States Long Term Ecological Research (US LTER) network and the International Long Term Ecological Research (ILTER) network.
  • Additionally, the Australian Long Term Ecological Research Network Data Portal (www.ltern.org.au/knb/), Australian SuperSites Network and Australian Centre for Ecological Analysis and Synthesis used the same underlying technology to publish data packages.
  • The Metacat system is great for a data repository but unfortunately (in my experience) the Morpho software package has repeatedly hampered data processing and increased risks of inadvertently publishing data with errors.
  • My colleagues and I workaround these problems using a lot of different ‘fixes’ for the different problems.
  • Fortunately there is an alternative to Morpho in the R statistical software environment called the R-EML package (https://github.com/ropensci/EML). This provides a library of functions used in the R language to generate and parse EML files.
  • This new workflow mitigates some of the risks of the Morpho software by ensuring the data related steps of the workflow are conducted in the R environment for statical computing.
  • However, some Issues remain in that this requires a fairly specialised computing environment with various Linux libraries configured appropriately

Results

  • I generate EML metadata using REML in the workflow shown in the figure below.

altext

Posted in  morpho data documentation


blog comments powered by Disqus