Return to ENVRI Community Home![]()
|
Requirements survey topics:
|
ENVRIplus Theme 2:
Requirements information gathering exercise
ICOS (Integrated Carbon Observation System)
RI representative(s):
ICOS Carbon Portal & Lund University
ICOS Ecosystem Thematic Centre & University of Tuscia
ICOS Ocean Thematic Centre & University of Bergen
This version from January 27, 2016.
3. Cataloguing
CASE 1: Atmospheric Thematic Center
The ATC has chosen not to provide specific answers to this part.
CASE 2: Ecosystem Thematic Center (Dario Papale, University of Tuscia)
1) Do you use catalogues or require using catalogues for the following items?
a) Observation system
Yes, description of all the sensors and their position, characteristics, calibration. We use a system named BADM that is an international standard in the community where each info is a variable.
b) Data processing system
Yes, we need to catalogue the processing system. It is not yet defined the system, at the moment code versioning and parameters collection are organization are the starting point
c) Observation event and collected sample
Yes, similar to the observation system and using the BADM
d) Data processing event
Probably yes, in coordination with the ICOS CP
e) Data product
As point d
f) Paper or report product
As point d
g) Research objects or feature interest (e.g. site, taxa, ...)
Not so relevant
h) Services (processing, discovery, access, retrieval, publishing, visualization, etc.)
No, it is a task for the ICOS Carbon Portal.
For EACH used or required catalogue, consider the following questions:
2) Item descriptions:
a) Which fields do you use for describing items?
They are variable dependent, for instrument there are model, serial number, position and then a number of additional info sensor specific
b) Which standards do you apply for these fields (format or standard model)?
We use the BADM system that has different input formats (excel, web, APP) and are imported in a database structure
c) Do you use controlled vocabularies for these fields? If so, please cite the vocabulary service providers.
Yes, impossible to cite, it is variable and parameter dependent, hundreds of options all together.
d) Do you maintain a cross-link or inter-links between:
i) Catalogue items (e.g. between observation system - observation event - result dataset)?
Yes, the observation system and events are used in the results calculation
ii) Fields for item description and actual items (e.g. between dataset fields - dataset access services or between sample fields - label on the sample)?
Yes, the BADM system is designed in this way, it is hierarchical so all the info relevant for the same group (sensor, sample etc) are linked.
e) Which repositories/software do you use to manage your metadata?
Script in C-Sharp, database SQL
3) Inputs:
a) Human inputs: Do you provide/need facilities for editors/reviewers to maintain the metadata in the catalogues (e.g. forms, validation workflow, etc.)? If so, please describe them briefly.
Currently yes because we are in a test phase. Not sure if we can build the system in a completely independent and automatic way. We check the error report from the import system and check the reasons in order to better explain to the data providers.
b) Machine inputs: Do you use/ need automated harvesting to populate your catalogues? If so, which protocol do you use (e.g. csw, oai-pmh, other, specific)?
I think it is not relevant for ETC.
c) How do you manage duplicates? i.e. Do you apply governance rules in a network of catalogues, do you use unique identifiers, or take other actions?
When exactly the same data are submitted they are imported and classified as duplicate. If even just a parameter is different then they are considered a new version and the previous retired.
4) Human outputs:
a) What specific feature is provided/required in your web discovery function (multi-criteria search, graphical selection components (e.g. map, calendar), facets, keyword or natural language)?
Not relevant for ETC
b) Do you evaluate the accessibility, quality, and usage of your catalogue by using a dashboard or value-added products?
Not relevant for ETC
If so, do you provide/need any of the following (please describe shortly as applicable):
i) Popularity or Usage feedback?
ii) Any other synthesis, indicators or statistics?
c) Is the catalogue freely readable or do you apply a specific authorization scheme? If you are applying a specific authorization scheme, please cite the authentication system (SSO) and the authorization rules.
Not relevant for ETC
5) Machine outputs:
a) Do you provide/need machine interfaces for accessing your catalogues? If so, which protocols are implemented?
They are simple ASCII format
b) Do you need to fit in Applicable Regulations requirements (e.g. INSPIRE) or embedding frameworks (GEOSS, OBIS)? If so, please cite the regulation, applicable interface requirements, impacts (format, performance, access policy, etc.) on your catalogues.
Not relevant for ETC.
CASE 3: Ocean Thematic Center (Benjamin Pfeil, University of Bergen)
1) Do you use catalogues or require using catalogues for the following items?
c) Observation system
Yes
d) Data processing system
For parts of the data by now, for all by 2016 when funding for OTC comes in
e) Observation event and collected sample
For parts of the data by now, for all by 2016 when funding for OTC comes in
f) Data processing event
For parts of the data by now, for all by 2016 when funding for OTC comes in
g) Data product
Yes SOCAT and GLODAP where part of the ICOS OTC data is included
h) Paper or report product
Just for parts of the data that are included in SOCAT or GLODAP
i) Research objects or feature interest (e.g. site, taxa, ...)
Don’t really know what is meant – but standardized vocabulary is currently being implemented.
j) Services (processing, discovery, access, retrieval, publishing, visualization, etc.)
For parts of the data by now, for all by 2016 when funding for OTC comes in
For EACH used or required catalogue, consider the following questions:
6) Item descriptions:
a) Which fields do you use for describing items?
No answer provided.
b) Which standards do you apply for these fields (format or standard model)?
No answer provided.
c) Do you use controlled vocabularies for these fields? If so, please cite the vocabulary service providers.
Yes, NERC Vocabulary Server
d) Do you maintain a cross-link or inter-links between:
i) Catalogue items (e.g. between observation system - observation event - result dataset)?
By now just for parts of data but will be implemented for the all OTC data
ii) Fields for item description and actual items (e.g. between dataset fields - dataset access services or between sample fields - label on the sample)?
Not sure what is meant
e) Which repositories/software do you use to manage your metadata?
By now we use PANGAEA which uses 4d (http://www.4d.com)
7) Inputs:
a) Human inputs: Do you provide/need facilities for editors/reviewers to maintain the metadata in the catalogues (e.g. forms, validation workflow, etc.)? If so, please describe them briefly.
We use the SOCAT automation by now which will be adjusted to ICOS OTC needs. All changes (QC comments, flags, metadata edits) are being tracked
b) Machine inputs: Do you use/ need automated harvesting to populate your catalogues? If so, which protocol do you use (e.g. csw, oai-pmh, other, specific)?
Not yet but will most likely be implemented in the future (oai-pmh)
c) How do you manage duplicates? i.e. Do you apply governance rules in a network of catalogues, do you use unique identifiers, or take other actions?
We use unique cruise identifiers (expocode) for each data set but we plan on using more advanced for each sensor (e.g. SensorML). SensorML will enable us to track changes for other variables as well e.g. temperature and salinity where other communities perform QC. Duplicates are now being sorted by contacting the PI to get the latest version but the feedback from other communities that handle non CO2 parameters is often not reflected. In the future we will have automated processes that will track changes directly
8) Human outputs:
a) What specific feature is provided/required in your web discovery function (multi-criteria search, graphical selection components (e.g. map, calendar), facets, keyword or natural language)? In the future: multi criteria search, spatial and temporal subsetting, facets, standardized vocabs and presets
No answer provided.
b) Do you evaluate the accessibility, quality, and usage of your catalogue by using a dashboard or value-added products?
Yes already for SOCAT for ICOS OTC soon
If so, do you provide/need any of the following (please describe shortly as applicable):
i) Popularity or Usage feedback?
Not yet
ii) Any other synthesis, indicators or statistics?
Yes for SOCAT, webfeedback and webstatistics
c) Is the catalogue freely readable or do you apply a specific authorization scheme? If you are applying a specific authorization scheme, please cite the authentication system (SSO) and the authorization rules.
No not yet
9) Machine outputs:
a) Do you provide/need machine interfaces for accessing your catalogues? If so, which protocols are implemented?
Just for parts of the data for the OTC that are available through SOCAT we provide OIA-PMH, DIF and ISO
b) Do you need to fit in Applicable Regulations requirements (e.g. INSPIRE) or embedding frameworks (GEOSS, OBIS)? If so, please cite the regulation, applicable interface requirements, impacts (format, performance, access policy, etc.) on your catalogues.
Not relevant
CASE 4: Carbon Portal view (Margareta Hellström, Lund University)
1) Do you use catalogues or require using catalogues for the following items?
a) Observation system
The components of the ICOS observation systems will be recorded & documented in the metadata catalogue, yes. The aim is that all relevant information (down to instrument types and serial numbers) will be accessible.
b) Data processing system
No, at this time there are no plans to use catalogues for storing this type of information.
c) Observation event and collected sample
Most ICOS measurements are continuous, but there are also periodic or singular measurements made. Lists of physical samples will be created, with all associated metadata. The results of sample analyses will be treated as data objects.
d) Data processing event
No, at this time there are no plans to use catalogues for storing this type of information.
e) Data product
Yes. All ICOS data products will be included in a catalogue.
f) Paper or report product
Yes. All ICOS-produced reports and articles will be catalogued. ICOS also aims to build up a registry (catalogue) listing all publications that are at least partly using ICOS data.
g) Research objects or feature interest (e.g. site, taxa, ...)
A registry of all ICOS observation stations is being set up.
h) Services (processing, discovery, access, retrieval, publishing, visualization, etc.)
There are plans to create a catalogue listing all computer-accessible services, including REST-based APIs and similar.
For EACH used or required catalogue, consider the following questions:
NOTE: Here we are currently only providing one example:
DATA PRODUCTS
2) Item descriptions:
a) Which fields do you use for describing items?
Data objects will primarily be described using Dublin Core v1.1 fields: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.
b) Which standards do you apply for these fields (format or standard model)?
Generally, ICOS strives to be compliant with ISO19115 for its metadata holdings. The catalogues of data objects will at least expose fields listed in the Dublin Core Metadata Element Set, Version 1.1.
c) Do you use controlled vocabularies for these fields? If so, please cite the vocabulary service providers.
This is under discussion.
d) Do you maintain a cross-link or inter-links between:
i) Catalogue items (e.g. between observation system - observation event - result dataset)?
Yes, at least with respect to observational data products. Here there will be clear links from a (published) dataset to the observation station, allowing further lookup all the way to the instrumentation used.
ii) Fields for item description and actual items (e.g. between dataset fields - dataset access services or between sample fields - label on the sample)?
No answer provided (unclear question).
e) Which repositories/software do you use to manage your metadata?
ICOS is currently building up its own metadata ontology and associated database, based in part on OWL and RDF.
3) Inputs:
a) Human inputs: Do you provide/need facilities for editors/reviewers to maintain the metadata in the catalogues (e.g. forms, validation workflow, etc.)? If so, please describe them briefly.
Yes, all catalogues will have interfaces for human interaction with the metadata - including forms for data input and editing.
b) Machine inputs: Do you use/ need automated harvesting to populate your catalogues? If so, which protocol do you use (e.g. csw, oai-pmh, other, specific)?
A large portion of metadata about data objects will be automatically harvested throughout the data life cycle, and uploaded to the catalogue (either event-triggered or via periodic updates). The exact protocols to be used are not yet decided.
c) How do you manage duplicates? I.e., do you apply governance rules in a network of catalogues; do you use unique identifiers; or take other actions?
In the case of “true” duplicates (identical copies, stored at different locations), these will be given PIDs, with a pointer to the “primary object” kept in the replicate object’s handle registry metadata entry as well as in the central ICOS data object metadata database. If a “content copy” (same informational content, but a different file or numeric format) is created, it will be treated as a separate object, given a PID and registered - again noting the PID of the “primary object”. (Updated on 2016-01-27.)
4) Human outputs:
a) What specific feature is provided/required in your web discovery function (multi-criteria search, graphical selection components (e.g. map, calendar), facets, keyword or natural language)?
The ICOS Carbon Portal discovery function will support advanced searches based on measurement theme (atmospheric, ecosystem and ocean); measurement station (name, country, geographic location); observed or derived parameter (name, unit, ...); parameter type (concentration, flux, meteorological, ...) and date & time interval.
b) Do you evaluate the accessibility, quality, and usage of your catalogue by using a dashboard or value-added products?
We will be tracking the usage of the catalogue by e.g. storing all searches in a database. In addition, a user feedback form is being implemented.
If so, do you provide/need any of the following (please describe shortly as applicable):
i) Popularity or Usage feedback?
Summaries of selected usage metrics will be available on the ICOS web site.
ii) Any other synthesis, indicators or statistics?
Annual reports illustrating more comprehensive summaries of data product and service usage will be provided.
c) Is the catalogue freely readable or do you apply a specific authorization scheme? If you are applying a specific authorization scheme, please cite the authentication system (SSO) and the authorization rules.
It is the intention that all catalogues will be freely readable to all interested parties.
5) Machine outputs:
a) Do you provide/need machine interfaces for accessing your catalogues? If so, which protocols are implemented?
Yes, the databases will support SPARQL queries returning results in e.g. JSON or RDFS triples.
b) Do you need to fit in Applicable Regulations requirements (e.g. INSPIRE) or embedding frameworks (GEOSS, OBIS)? If so, please cite the regulation, applicable interface requirements, impacts (format, performance, access policy, etc.) on your catalogues.
Yes. ICOS is working on becoming fully INSPIRE and GEOSS compliant.
Stichting EGI에게 부여된 무료 Atlassian Confluence Community License로 실행됩니다. 오늘 Confluence를 평가해 보세요.