1. Background

1.1 Short description

The implementation case aims at fulfilling requirements for curation, cataloguing and provenance.
The targeted usages are:

Catalogue is used for discovery (finding items of interest), contextualisation (determining relevance and quality) , access (connecting together users, datasets, software, resources to achieve the user end-objective).
Items described in catalogues are among: datasets, systems and resources for observation and processing, observations event and results (e.g. samples), documents, persons, research objects.
Provenance and curation functions rely on catalogue as a back-end repository, as input or output
Provenance relates to contextualisation. It provides functions writing, updating and reading catalogue to complete discovery and access with services determining relevance and quality of the items described in catalogues.
Provenance being well covered in other implemention cases (IC_2 mostly, but IC_6 and IC_9 as well), the current implementation case will collaborate with them for requirements and fulfill them so to demonstrate a couple of provenance functions: to be listed (Barbara) 2 functions related to dataset's provenance.
Curation relates to the data management processes required to ensure availability of digital assets (datasets, software) through media migration to ensure physical readability, redundant copies to ensure availability, appropriate security and privacy measures to ensure reliability and appropriate catalogue content maintenance to ensure discovery, contextualisation and access to this digital assets.
The current implementation case fulfill requirements for a couple of curation functions: (a) automated media migration of datasets to ensure continued availability and readability; (b) discovery of a curated dataset along with appropriate curated software and operating environment

1.2 Contact

Background

Contact Person

Organization

Contact email

_<Choose one of the following roles: [RI-ICT

RI-Domain

ICT

e-Infrastructure]>_

<Full name>

<Organization of the contact person>

<Email>

RI (Use Case proposer, Agile Group leader)

Thomas Loubrieu

IFREMER

Thomas.loubrieu@ifremer.fr

RI

Keith Jeffery

 

Keith.Jeffery@keithgjefferyconsultants.co.uk

RI

Chrstian Pichot, Andre Chanzy

INRA (ANAEE)

christian.pichot@paca.inra.fr andre.chanzy@avignon.inra.fr

ITC

Marco Rorro Giovanni Morelli The persons who managed CKAN for EUDAT would be perfect here !

EUDAT

M.rorro@cineca.it g.morelli@cineca.it

RI

Damien Boulanger

IAGOS

damien.boulanger@obs-mip.fr

RI

Maggie Hellstrom

ICOS

margareta.hellstrom@nateko.lu.se

RI

Barbara Magagna, Johannes Peterseil

LTER

Barbara.magagna@umweltbundesamt.at Johannes.peterseil@umweltbundesamt.at

1.3 Use case type

Implementation case
Conditions:

Implications:

1.4 Scientific domain and communities

Scientific domain

To be relevant in ENVRIPLUS context, the implementated functions must be validated by at least 2 RIs, preferably in 2 different spheres (bio, liquid, solid, gas):

Community

Behavior

The connected behaviours are:
Data acquisition community:

Data Service provision community:

2. Detailed description

Objective and Impact

Catalogue
The catalogue aims at providing functions cross-cutting RI, to edit and discover the following items:

Action  1: Persons and documents will be described and federated in pre-existing e-infrastructures, to be defined (e.g. orcID, …) so to fulfill requirements for the provenance and curation functions.
Action  2: Datasets description will be federated from harvesting the datasets catalogue (in whatever 'standard' metadata format) in each RI in a single entry point (metadata format to be chosen among: DC, DCAT, INSPIRE/ISO19115, geonetworks, CKAN, CERIF ) to be defined so to fulfill requirements for the provenance and curation functions.
Action  3: Observation systems, events and results (including collected samples) edition and discovery functions will be implemented by a combination of RI specific tools and federated tools (e.g. for edition) so to fulfill requirements for the provenance and curation functions.

Challenges

The main challenge is the involvement of RI, from definition of the functions to the adoption of the solution.

Detailed scenarios

In the context of the 3 above actions:

  1. define curation and provenance functions to be provided, identify related requirement on catalogue (format and access API).
  2. define catalogue requirements  for discovery and access
  3. define metadata profile and access API
  4. implement the centralized or federated solution

As for AGILE, the steps can be iterative by having new iteration for new requirements identified or RI supported.

Technical status and requirements

E-infrastructures which manage catalogues of persons and documents are existing, available through standard interfaces and cross-cutting RI.
Catalogues of datasets are generally provided by RI and their content is available through standard interfaces. Some tools are available on the shelf to implement the catalogue of datasets (DC, DCAT, INSPIRE/ISO19115, geonetworks, CKAN, CERIF). ENVRIPLUS need to federate them by utilising the richest available 'standard' and providing mappings to the others.
Catalogue of observation systems, events or samples may exists in RI. They are seldom or never accessible through standard interfaces. Some RI lack proper tools to manage these information which is however critical for the good quality and traceability of scientific results. 

Implementation plan and timetable

Documents and persons
E-infrastructures which manage catalogues of persons and documents are existing.
The implementation case will define a list of official sustainable person and document repository which should be used by RI to describe their resources. and define mappings to/from the ENVRIPLUS catalogue metadata standard (when chosen)
Expected result in Octobre 2016
Datasets
The implementation case will identify catalogues of datasets in RI and analyse their machine to machine interface for harvesting purpose. A single tool will harvest them centrally. Then their metadata will require conversion from local RI format to that of the ENVRIPLUS central catalogue as described above.
Expected result in Octobre 2017
Observation systems, events or samples
An integrated system will shows observations systems, events and collected samples from 2 or 3 RI in liquid (EMSO, ARGO), solid (EPOS) and gas (ICOS) spheres.
Tools will be provided to easily edit the descriptions for RI which would not have their own system yet.
As before this will rquire mapping the metadata describing systems, events, samples at each RI to the common metadata standard of ENVRIPLUS.
Expected result in Octobre 2018 

Expected output and evaluation of output

Documents and persons
number of RI actually using the chosen person and document e-infrastructure to identify their resources.
Datasets
Number of RI which dataset results descriptions are available in the federated system.
Number of users of the federated dataset catalogue (inside or outside the RI).
Observation systems, events or samples
Number of observation systems which events and results are actually available in the federated catalogue.
Number of users of the catalogues as support of the activities in the RI.

External Links

  1. IC_8 notebook: {+}https://envriplus.manageprojects.com/projects/wp9-service-validation-and-deployment-1/notebooks/659+