페이지 트리

 

 

 

ENVRI

Services for the Environmental Community

 

 

 

Using the Reference Model in

ICOS Carbon Portal Design

 

 

Document identifier:

<Document name>

Date:

02/06/2014 4

Activity:

WP3

Lead Partner:

CU

Document Status:

[DRAFT]

Dissemination Level:

PUBLIC|RESTRICTED|CONFIDENTIAL

Document Link:

<link to the website>

 

ABSTRACT

 

 

 


  1. Copyright notice

Copyright © Members of the ENVRI Collaboration, 2011. See www.ENVRI.eu for details of the ENVRI project and the collaboration. ENVRI (“ Common Operations of Environmental Research Infrastructures ”) is a project co-funded by the European Commission as a Coordination and Support Action within the 7th Framework Programme. ENVRI began in October 2011 and will run for 3 years. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, and USA. The work must be attributed by attaching the following reference to the copied elements: “Copyright © Members of the ENVRI Collaboration, 2011. See www.ENVRI.eu for details of the ENVRI project and the collaboration”. Using this document in a way and/or for purposes not foreseen in the license, requires the prior written permission of the copyright holders. The information contained in this document represents the views of the copyright holders as of the date such views are published.

  1. Delivery Slip

 

Name

Partner/Activity

Date

From

 

 

 

Reviewed by

Moderator:

Reviewers:

 

 

Approved by

 

 

 

  1. Document Log

Issue

Date

Comment

Author/Partner

1.0

 

 

 

2.0

 

 

 

3.0

 

 

 

4.0

 

 

 

 

  1. Application area

This document is a formal deliverable for the European Commission, applicable to all members of the ENVRI project, beneficiaries and Joint Research Unit members, as well as its collaborating projects.

  1. Document amendment procedure

Amendments, comments and suggestions should be sent to the authors.

 

  1. Terminology

A complete project glossary is provided at the following page: http://www.ENVRI.eu/glossary.

ENVRI Reference Model terminology is provided at the wiki site: http://www.envri.eu/rm.   

  1. PROJECT SUMMARY

 

Frontier environmental research increasingly depends on a wide range of data and advanced capabilities to process and analyse them. The ENVRI project, “Common Operations of Environmental Research infrastructures” is a collaboration in the ESFRI Environment Cluster, with support from ICT experts, to develop common e-science components and services for their facilities. The results will speed up the construction of these infrastructures and will allow scientists to use the data and software from each facility to enable multi-disciplinary science.

 

The target is on developing common capabilities including software and services of the environmental e-infrastructure communities. While the ENVRI infrastructures are very diverse, they face common challenges including data capture from distributed sensors, metadata standardisation, management of high volume data, workflow execution and data visualisation. The common standards, deployable services and tools developed will be adopted by each infrastructure as it progresses through its construction phase.

 

Two use cases, led by the most mature infrastructures, will focus the development work on separate requirements and solutions for data pre-processing of primary data and post-processing toward publishing.

 

The project will be based on a common reference model created by capturing the semantic resources of each ESFRI-ENV infrastructure. This model and the development driven by the test-bed deployments result in ready-to-use systems which can be integrated into the environmental research infrastructures.

 

The project puts emphasis on synergy between advanced developments, not only among the infrastructure facilities, but also with ICT providers and related e-science initiatives. These links will facilitate system deployment and the training of future researchers, and ensure that the inter-disciplinary capabilities established here remain sustainable beyond the lifetime of the project.

 

  1. EXECUTIVE SUMMARY

 

The Integrated Carbon Observatory System, ICOS, is built to enable research to understand the greenhouse gas budgets and perturbations.

 


TABLE OF CONTENTS

1 Introduction .........................................................................................................

1.1 Purpose ................................................................................................................

1.2 Application area

1.3 Terminology

2 Analysis of ICOS Research Infrastructure ...................................

2.1 Analysis of ICOS Research Infrastructure from Science Viewpoint

2.1.1 ICOS RI Roles

2.1.2 ICOS RI Communities Behaviours

2.1.3 ICOS RI Workflow

2.2 Analysis of ICOS Research Infrastructure from Information Viewpoint

2.2.1 Overview .........................................................................................................

2.2.2 Data lifecycle seen from the ENVRI reference model .............................

2.2.3 Information Objects ......................................................................................

2.2.4 Information Actions ......................................................................................

2.3 Analysis of ICOS Research Infrastructure from Computational Viewpoint

2.3.1 Core computational objects .........................................................................

2.3.2 Thematic Centres ..........................................................................................

2.3.3 Measurement Station Networks .................................................................

2.3.4 Carbon Portal .................................................................................................

2.3.5 Core bindings .................................................................................................

2.3.6 Compound Bindings ......................................................................................

3 Conclusion .............................................................................................................

3.1 <...>

ApPendix .......................................................................................................................

A How to read the Model (Computational Viewpoint) .................................

A.1 A note about implementation ..........................................................................

B How to use the Model (Computational Viewpoint) ...................................

4 References ............................................................................................................

1        Introduction

1.1    Purpose

This document provide requirement analysis and design advice for ICOS research infrastructure using ENVRI Reference Model as analysing tool.

 

The Integrated Carbon Observatory System, ICOS, is built to enable research to understand the greenhouse gas budgets and perturbations. ICOS RI provides the long-term observations required to understand the present stated and predict future behaviour of the global carbon cycle and greenhouse gas emissions. Linking research, education and innovation promotes technological development related to greenhouse gases.

 

 

1.2    Application area

 

1.3    Terminology

A complete project glossary is provided in the ENVRI glossary:

http://www.ENVRI.eu/glossary/ .

 

The Reference Model glossary is provided in the wiki:

http://confluence.envri.eu:8090/display/ERM/Appendix+B+Terminology+and+Glossary

2.1      Analysis of ICOS Research Infrastructure from Science Viewpoint

ICOS Research Infrastructure (ICOS RI) is built to provide the long-term observations required to understand the present state and predict future behaviours of climate, the global carbon cycle and greenhouse gases emissions.

ICOS RI Objectivise : include

  • Tracks carbon fluxes in Europe and adjacent regions by monitoring the ecosystems, the atmosphere and the oceans through integrated networks.
  • Provides the long-term observations required to understand the present state and predict future behaviour of the global carbon cycle and greenhouse gas emissions.
  • Monitors and assesses the effectiveness of carbon sequestration and/or greenhouse gases emission reduction activities on global atmospheric composition levels, including attribution of sources and sinks by region and sector.

 

Figure 2.1.1 and 2.1.2 shows the annotations of ICOS RI organisational structure using the Reference Model (RM) terminologies from the Science Viewpoint. From the analysis, the community roles and behaviours can be identified, and workflow can be understand.

 

 

 

 

Figure 2.1.1 : Annotation of ICOS Organisational Structure (1) Using Terminology of the Reference Model Science Viewpoint

 

 

 

 

 

 

 

Figure 2.1.2: Annotation of ICOS Organisational Structure (2) Using Terminology of the Reference Model Science Viewpoint

 

 

 

 

 

 

 

 

 

 

 

 

2.1.1 ICOS RI Roles

Table 2.1.1 provides the roles identified in ICOS Research Infrastructure, the descriptions of them, and the role names defined by the Reference Model. 

Table 2.1.1 : Roles in ICOS RI and Role Names in the Reference Model

Roles Instances in ICOS RI

Descriptions

RM SV_Roles Names

ICOS General Assembly

 

  • Police or Decision Maker

Scientific Advisory Board

(SAB)

 

  • Police or Decision Maker

ICOS RI Committee

It is an advisory body to the Director of ICOS ERIC, and decides about strategies concerning the Carbon Portal [1] .

  • Police or Decision Maker

Director General

 

  • Police or Decision Maker

Head Office/Headquarter

(HO)

The ICOS RI Head Office will have three main task groups, which are 4 :

  1. Managing the ICOS ERIC legal entity
  2. Strategic scientific and technical planning, coordination and integration.
  3. Community building, outreach, promotion and training
  • Police or Decision Maker

Carbon Portal (CP)

The Carbon Portal shall provide a "one-stop shop" for ICOS data products. It is envisioned as a place where all data produced within ICOS station network can be discovered and accessed and where the scientific community can post elaborated data products that are obtained from ICOS data [2] .  

  • Data Curation Subsystem
  • Data Access Subsystem
  • Service Provider

Potentially, CP may also be:

  • Service Registry
  • PID Generator
  • PID Registry
  • Semantic Mediator

Connect projects and International network

Provide data to ICOS RI

  • Data Originator
  • Global networks GEOSS
  • Greenhouse gas flux assessment International programs

Consume the data provided by ICOS RI

  • Data   Consumer

The Central Analytical Laboratory (CAL)

CAL ensures the accuracy of observational data, thorough quality control and routine testing of air sampling material. It provides reference gases for calibration of in-situ measurements performed at the continuous monitoring stations. It also analyses air samples collected at the monitoring stations. CAL is hosted by Germany 4 .

  • Data Curator

 

The Atmospheric Thematic Centre (ATC)

ATC   is responsible for continuous and discontinuous air sampling, instrument development/servicing, data processing and storage. A central place is needed to ensure that all data are treated with the same algorithms and properly archived for the long term, that the ICOS atmospheric stations can receive permanent support for optimal operation during their lifetime, and that new sensors can be smoothly implemented in the network in the future. ATC is coordinated and hosted by France, with Nordic Hub and Mobile Lab hosted by Finland [3] .

  • Data Curator
  • Data Curation Subsystem
  • Storage Administrator
  • Storage
  • Data Originator

The Ecosystem Thematic Centre   (ETC)

ETC coordinates the ICOS Ecosystem Network providing assistance with instruments and methods, testing and developing new measurement techniques and associated processing algorithms. It also ensures a high level of data standardization, uncertainty analysis and database services in coordination with the ICOS Carbon Portal. ETC is coordinated and hosted by Italy, together with Belgium and France 4 .

The Ocean Thematic Centre   (OTC)  

OTC will be coordinating measuring the carbon cycle in oceans within ICOS. It will provide support to the ICOS marine network in the form of information and technical backup on the state of the art instrumentation and analytical methods. It will provide of data storage and processing techniques, quality control, and network-wide integration of data to into useful products, such as maps of CO 2   fluxes, carbon transport, and the assessment of ocean acidification 4 .

Monitoring Station Assemblies (MSA)

MSAs discuss technical and scientific matters, and services concerning their component to further develop and improve ICOS and its networks. MSAs work together with ATC, ETC and OTC, but have also independent role [4] .

 

MSA Members are scientific and technical experts from the monitoring stations of Member countries that constitute the basis of ICOS ERIC; All Atmospheric station PIs, Ecosystem station PIs and Ocean station PIs are the members of the respective MSAs.

  • Environmental Scientist
  • (Measurement Model) Designer

Station Principal Investigators (SPI)

 

  • Data Curator

Atmospheric Stations

They are established to measure continuously the greenhouse gas (CO2, CH4, N2O) concentration variability due to regional and global fluxes [5] .  

  • Sensor
  • Sensor network
  • Technician
  • Measurer
  • Data collector
  • Data Acquisition Subsystem

Ecosystem Stations

They are built for monitoring the functioning of land ecosystems and the exchange of energy and greenhouse gases between the ecosystems and the atmosphere 3 .

Ocean Ships and Stations

Marine ICOS will provide the long-term oceanic observations required to understand the present state and predict future behaviour of the global carbon cycle and climate-relevant gas emissions 3 .

Users of ICOS data products:

Researchers;

International and national Operational Centres assimilating atmospheric composition data;

Policymakers and stakeholders involved in negotiating carbon reduction policies;

Carbon trading communities;

Regional authorities and carbon inventory agencies;

Private land owners and industrial contributors of greenhouse gas emissions;

The general public interested in greenhouse gas emissions and global climate change.

Commercial users

Others

 

  • Scientist or Researcher
  • Police or Decision Maker
  • Private Sector (Industry investor or consultant)
  • General Public, Media or Citizen (Scientist)

 

2.1.2 ICOS RI Communities Behaviours

Table 2.1.2 provides mapping of  ICOS roles to the ENVRI 5-common-community . Analysing the role key responsibilities results in the mapping of the community behaviours defined in the Reference Model.    

Table 2.1.2 : Mapping of ICOS RI roles into the ENVRI Common Communities and Identifying the Community Behaviours

 

Roles Instances in ICOS RI

Key Responsibilities

RM SV_Community Behaviours

Data Acquisition Community

  • National Measurement Networks
    • Atmospheric Stations
    • Ecosystem Stations
    • Oceanic Ships and Stations
  • Perform measurements according top ICOS standards
  • Collect data and send to Thematic Centres
  • Can have non-ICOS functionality & responsibilities, e.g., they may also
    • Collect other types of data
    • Perform their own data analysis (*not* official ICOS!)
    • Operate their own web sites
  • Instrument Configuration
  • Instrument Calibration
  • Data Collection
  • Monitoring Station Assemblies (MSAs)

(See role descriptions)

  • Design of Measurement Model

Data Curation Community

  • Station Principal Investigators (SPIs)
  • Perform quality checks
    • In near real time (for some systems)
    • After (pre-) processing at Thematic Centres
    • Before “final” datasets are “published”
  • Data Quality Checking
  • Central Facilities
    • Ecosystem Thematic Centre
    • Atmospheric Thematic Centre
    • Ocean Thematic Centre
  • Compose and maintain procedures and protocols for measurements
  • Create “publishable” data sets
  • Keep own competence up to date
  • Maintain their own websites
    • Info on measurements
    • Near Real-Time data visualization
  • Data processing info (for SPIs, mainly)
  • Serve as experts
    • For stations within ICOS RI network
    • For external partners (if resources allow)
  • Data Preservation
  • Data Product Generation
  • Data Replication
  • The Central Analytical Laboratory

(See role descriptions)

  • (Instrument) Calibration
  • Data Quality Checking
  • Connect projects and International network

(See role descriptions)

 

  • ICOS Carbon Portal

Organize and ensure back-up storage and long-term archiving of published ICOS data sets

  • Data Replication
  • Data Preservation

Data Publication Community

  • ICOS Carbon Portal
  • Generate and provide effective tools to publish , discover , access and retrieve ICOS observations data according to user needs
  • Offer user-friendly, web-based access to products elaborated from ICOS data
  • Establish interfaces with other relevant data portals
  • Ensure basic semantic interoperability by maintaining a full copy of the standard metadata and data description documents (ontologies) held at the ICOS TCs, including the compilation of the vocabularies in use within ICOS
  • Coordinate regular publication of the ensemble of the ICOS data, with the TCs and the ICOS community of PIs
  • Organize the traceability of downloaded ICOS data, including the application of persistent unique identifiers for citation purposes
  • Record relevant bibliometric information and establish indicators about the use of ICOS data
  • Data Publication
  • Data Discovery & Access
  • Semantic Harmonisation
  • Data Citation

Data Service Provision Community

  • Central Facilities
    • Ecosystem Thematic Centre
    • Atmospheric Thematic Centre
    • Ocean Thematic Centre
    • Analytical Laboratory
  • Process data (and analyze some samples)

 

  • ICOS Carbon Portal
  • Define and implement advanced web services and procedures for web-based data visualization , retrieval and processing
  • Encourage, coordinate , facilitate and ensure the operational provision of elaborated products and synthesis efforts based on ICOS data
  • Service Description
  • Service Coordination
  • Service Composition

Data Usage Community

  • ERIC Head Office

Organise general ICOS outreach actions on the basis of the scientific material (advanced data plots and visuals) provided by the Carbon Portal [6] .

 

  • Director General
  • ICOS RI Committee
  • ICOS Council
  • Scientific Advisory Board (SAB)
  • General Assembly

(See role descriptions)

 

  • Global networks GEOSS
  • Greenhouse gas flux assessment International programs

(See role descriptions)

 

  • Users of ICOS data products

(See role descriptions)

 

  • ICOS Carbon Portal

Implement a common user registration authentication system for ICOS that allows usage tracking

  • User Profile Management
  • User Behaviour Tracking

 

Note

By ODP/RM definition, a computational system could play a passive role in a community. For example, ICOS Carbon Portal is regarded as a role in the communities of:  Data Curation, Data Publication, Service Provision and Usage.

 

Note

The 5-common-subsystem and their objectives as defined by the Reference Model are depicted in Figure 2.1.3.

 

Figure 2.1.3: ENVRI 5-common-subsystem and their objectives

        Definitions

  • Data Acquisition Community , who collect raw data and bring (streams of) measurements   into a system;
  • Data Curation   Community , who curate the scientific data, maintain and archive them, and produce various data products with metadata;
  • Data Publication   Community , who   assist data publication, discovery and access;
  • Data Service Provision   Community , who provide various services, applications and software/tools to link and recombine data and information in order to derive knowledge;
  • Data Usage   Community , who make use of data and service products, and transfer knowledge into understanding.  

 

 

 

 

2.1.3 ICOS RI Workflow

We have explicitly defined the ICOS community roles and clearly described their behaviours. In this subsection, we examine how those roles interact with each other (through ICOS RI) and collaboratively fulfil the community objectives. We will output the result of the analysis in a workflow diagram which depicts the activities and processes conducted by roles, and the directions of controls and objects flows from one role to another.  

 

Figure 2.1.4 are the information obtained from ICOS team. (a) gives an overview of computation and data-flow in the ICOS RI. (b) provides the details of ICOS data life-cycle, and (c) describes the DOIs assigning process. From these information, we conclude the key community processes and workflow in Figure 2.1.5.

(a) An overview of the proposed data-flow in ICOS (April 2014)

 

     

Lifecycle of the NRT Data                                         Lifecycle of the L2 Data

Lifecycle of the L3 Data

(b) ICOS Data Lifecycle

(c) ICOS PIDs (DOIs) Assigning Process. DOIs by the Reference Model definition is one type of Persistent Identifiers (PIDs). (c) shows a DOI system will be established within the Carbon Portal to assign DOIs to the L0 data generated at the Stations, L1, L2 data produced at the Thematic Centres, and L3 data processed at the Carbon Portal.

Figure 2.1.4: Analysis of ICOS Requirements

Figure 2.1.4: ICOS RI Community Process. Each column corresponds to one ICOS RI community role. A black dot represents the starting point of the workflow, and a black dot with a circle represents an ending point of the workflow. Each box in a role column represents a process performed by that role. An arrow indicates the direction of the (control/object) flow between processes. 

Figure 2.1.4 describes the workflow and the key community process from data collection to data access. The workflow starts from the process that each Station 1) “ collects the L0 data ”, and 2) “ stores the L0 data ”. At this point, each station may request the Carbon Portal to 3) “ generate PIDs (DOIs) for L0 data ”. With available PIDs, each station will 4) “ add PIDs (DOIs) to L0 data ”, also 5) “ add and store metadata for L0 data ”. Then, station Principle Investigators (PIs) will 6) “ check quality of L0 data ”. Thereafter, L0 data will be delivered to Thematic Centres. Each Thematic Centre will 7) “ store L0 data and metadata ”, and 8) “ archive L0 data and L0 metadata ”. Each Thematic Centre also 9) “ enables the visualisation of (the L0) data ”, to allow 10) end users to “ view (the L0) data from Thematic Centres websites ”. After 7), Thematic Centres also 11) “ pre-process L0 data to generate L1, L2 data ”, 12) “ store L1, L2 data ”, 13) request the Carbon Portal to “ generate PIDs (DOIs) for L1, L2 data ”, 14) ” add PIDs (DOIs) to L1, L2 data ”, and 15) “ add and store metadata for L1, L2 data ”. At this point, station PIs may need to 16) “ check quality of L1, L2 data ”. After that, Thematic Centres will 9) “enable the visualisation of (the L1) data ” and 10) allow end users to “ view (the L1) data from Thematic Centres websites ”. Meantime, a copy of dataset will be sent to the Carbon Portal. The Carbon Portal will 17)“ archive L1, L2 data and L1, L2 metadata ”, 18) “ store L2 data ”, 19) “ enable search & discovery of L2 metadata ”, 20) “ enable download and visualisation of L2 data ”. This will enable end user to 21) “ view the L2 data from www.icos-carbon-portal.eu ”. The Carbon Portal will also 22) “track statistics” of any usage of the data. With stored L2 data, after 18), the Carbon Portal also 23) “ processes L2 data to generate L3 Data ”, 24) “ stores L3 data ”, 25) “ generates PIDs (DOIs) for L3 data ”, 26) “ adds PIDs (DOIs) to L3 data ”,  27) “ adds and stores metadata to L3 data ”, and 28) “ archive L3 data and L3 metadata ”. Meantime, the Carbon Portal also 19) “ enables search & discovery of (the L3) metadata ”, 20) “ enables download and visualisation of (the L3) data ”, which will enable end user to 21) “ view (the L3) data from www.icos-carbon-portal.eu ”. Again, any usage of the ICOS data will be 22) “ tracked ” by the Carbon Portal.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.2      Analysis of ICOS Research Infrastructure from Information Viewpoint

The Information Viewpoint is represented by Information Objects and Information actions and the data states [7] .

2.2.1 Overview

Observation and measurement stations are established where specific requirements are fulfilled and thus there might exist a specification of investigation design.

 

The measurement is done with defined devices, arranged in defined geometries and according to defined setups. All those characteristics are kept together in the measurement description , which is planned to be stored in metadata and/or in a data provenance description.

 

The measurements at the observation stations produce measurement results (L0 Data) as they come out of the sensors . Usually this data is not calibrated and expressed in non-physical units, e.g. in milivolts. This raw data is not published.

 

For further steps those data are persisted (stored for a longer period).

 

A lot of data handlings are carried out with persistent data, changing the state. In ICOS different levels of data are defined: L0, NRT, L1, L2 and L3. In the information viewpoint of the RM different data states are specified according to the action which has been applied. When we adapt the Reference Model to the data levels of ICOS we have to consider that the actions can be applied in different orders and that they can be performed on different data states. Flagging the actions applied to the data it is possible to describe the different data levels as you can see in the following cross-table. In the data lifecycle (2.2.2) and the tables of chapters 2.2.3 and 2.2.4 the actual mapping is made evident.

 

 

First automated quality check

Conversion to proper unit (pre-processed)

Manual QAQC

Gaps identified (quality flags)

Gaps filled

(processed)

Averaged data

Metadata stored

Identifier

Provenance stored

User tracked

Backup

Published

Post-processed

(post)

Measurement results:

 

 

 

 

 

 

 

 

 

 

 

L0

 

 

 

 

 

 

 

p

 

 

 

 

 

Persistent data:

 

 

 

 

 

p

 

 

 

 

 

NRT

x

x

 

 

 

 

p

p

p

x

x/TC x/CP

x/TC

 

L1

x

x

x

x

 

 

p

p

p

x

x/CP

x/TC

 

L2

x

x

x

x

x

x

p

p

p

x

x/CP

x/CP

 

L3

x

x

x

x

x

x

p

p

p

x

x/CP

x/CP

x

 

x… applied

p… planned to be applied

 

NRT (Near Real Time) data are stored at the Thematic Center (TC) pass a first-level automated quality check, a first processing to convert units from Voltage [mV] to Concentration [ppm] and, after a minimum time delay, are ready to be downloaded by “special” users (researchers) interested in very fast access.

 

L1 are stored at the Thematic Center data have undergone automated quality checks and checked by the Prinicipal Investigator of the observation station (“manual” QAQC processing). This process will likely result in “gaps”, where some parameters are given a “missing value” to indicate the data are unreliable (associating quality flags). L1 data are published at the Thematic Center and therefore are ready to be downloaded by selected researches.

 

L2 data are gap-filled data, where missing values are replaced with interpolated or otherwise modelled data (based e.g. on functions of meteorological parameters). They area consolidated and averaged data – at half-hourly or hourly frequency.

 

L3 data are elaborated data products, also called post-processed data. These would typically be data sets containing the outcome of different types of modeling - such as inverse modeling (using GHG concentration data from ICOS as part of the inputs) that gives "maps" showing sources and sinks of greenhouse gases distributed in time and space, or fluxes and other parameters calculated with ecosystem or vegetation models (using ICOS meteo, solar radiation and GHG gas data from specific ecosystem sites as input and/or as validation). Also other types of L3 products are possible, for example based on combining remote sensing data with ICOS data.

 

NRT, L1, L2, and L3 data are archived (backup) at data storages at the Carbon Portal (CP). NRT and L1 data are published at the TC, while L2 and L3 data are published at the CP. The usage of all levels of the persisted data are tracked by the CP, as the user interested in data is asked to register (at least a valid e-mail is required, plus probably indication of intended use). Persistent identifiers are planned to be assigned at each data level . Metadata at all levels are planned to be stored to keep track of a lot of information in order for the data sets to be useable and interpretable - ranging from info about the observations themselves (instruments, calibrations, potential issues), about the data processing (quality level, gapfilling method, flux evaluation methodology, date & time of TC processing) and of course about the datasets (revision, author, contact person, doi etc.). At the moment the metadata standard to be followed is not fixed. Published data will be ready to be queried via the CP providing a searching metadata service using a metadata repository.

 

By experience we know, that the meaning of the data, the conceptual model behind the data may change over time, as science proceeds. If those data shall be processed together they have to be mapped to a common conceptual model. We suggest therefore to build a local conceptual model for each Thematic Center and for the ICOS Carbon Portal which will then be used together for performing a mapping.

 

The first illustration gives an overview of the life cycle of ICOS data, seen   from the information viewpoint. It reflects the state of ICOS,   as it is already (more or less) implemented and representable by the current   information viewpoint of the ENVRI RM.

 

The usage tracking action and all relevant information objects needed for this process are missing so far, as the important information for the user about the data genesis can also be gained by data provenance tracking without the need of storing information about the user and his intentions. As the ICOS head office is interested to introduce authorization and authentication to the ICOS portal enabling usage tracking, the necessary   elements should be added to the data life cycle illustration in the reference model.     N ot yet implemented but planned are the addition of persistent unique identifiers, the metadata collection and the metadata query to support data search and discovery.  The second illustration adds the information needed (in green shades) for usage tracking and for metadata to complete the picture of the planned Carbon Portal.

2.2.2 Data lifecycle seen from the ENVRI reference model

We provide the analysis of ICOS data lifecycle in Figure 2.2.1, which are instantiation of the Dynamic Schemata specified in the ENVRI Reference Model Information Viewpoint [8] .

 


Figure 2.2.1: ICOS Data Lifecycle


2.2.3 Information Objects

Table 2.2.1: Mapping of ICOS data object instances to the RM Information Objects

ENVRI RM IV_Objects

ICOS instance

existing/planned

Specification of investigation design

Specification of site requirements

existing

Measurement description

Specification of ICOS TC measurement or observation

existing

Measurement result

L0 data

existing

Persistent data,

data state: raw

NRT data

existing

Persistent data,

data state: QA assessed

L1

existing

Persistent data,

data state: finally reviewed

L2

existing

Persistent data,

data state: published

Published at TC

existing

Persistent data,

data state: published

Published at CP

planned

Backup

ICOS RI Archive

existing

QA notation

Specification for automated quality check

existing

QA notation

Flag for gaps

existing

Unique identifier

PID?

planned

Data Provenance

Provenance information

planned

Metadata:
state: raw

raw metadata

planned

Metadata:
registered

Registered metadata

planned

Metadata:
published

Published metadata

planned

Metadata catalogue

Metadata registry

planned

not yet implemented

Usage statistics

planned

 

 

 

 

 

 

2.2.4 Information Actions

Table 2.2.2: Mapping of ICOS data object instances to the RM Information Action Types

ENVRI RM IV_Action Types

ICOS instance

existing/planned

specify investigation design

specify site requirements

existing

specify measurement or observation

specify specific TC measurement or observation

existing

perform measurement or observation

perform ICOS measurement or observation

existing

store data

store ICOS data

existing

check quality

automated quality checking

existing

check quality

manual quality checking

existing

carry out backup

archive ICOS data

existing

publish data

publish data at TC

existing

publish data

publish data at CP

planned

process data

preprocessing (conversion)

existing

process data

building averages

existing

process data

gapfilling

existing

process data

Postprocess

existing

assign unique identifier

Assign PIDs?

planned

add metadata

add ICOS metadata

planned

register metadata

register ICOS metadata

planned

publish metadata

publish ICOS metadata

planned

query data

query ICOS data

Planned

do data mining

do data mining

planned

Annotate action

Annotate action

planned

query provenance

query provenance

planned

not yet implemented

track usage

planned

 

 

 

 

 

 

2.3      Analysis of ICOS Research Infrastructure from Computational Viewpoint

The Integrated Carbon Observation System (ICOS;   http://www.icos-infrastructure.eu/ ) is a distributed research infrastructure involving a number of key facilities and services :

Figure 2.3.1: An overview of the proposed data-flow in ICOS (April 2014)

From the   computational perspective , each of the core ICOS facilities (principally the   thematic centres and the Carbon Portal) is responsible for providing a number of infrastructure functions. Within the infrastructure as a whole, there exist a number of interactions within and between these facilities that must be modelled; modelling these interactions will (a) ensure that key use-cases have been accounted for and (b) provide a basis for component-wise comparison with other related infrastructure projects.

The deconstruction of the ICOS research infrastructure adheres to the terminology defined   here (see Appendix A); the methodology of the (initial) modelling of the ICOS infrastructure is based on principles defined   here (see Appendix B).

The current version of the ENVRI Reference Model is deemed to be a minimal model, in that it concentrates on the critical data pipelines between parts of an infrastructure. As a result, some aspects of the ICOS specification (such as the data visualisation capabilities of the Carbon Portal) are   not   properly accounted for in the descriptions below, but could be added later.

2.3.1 Core computational objects

The first observation to be made about the ICOS computational infrastructure is that there are multiple curation sites; each of the thematic centres stores data, as does the Carbon Portal (via its backend services). This means that there are multiple instances of   data curation objects , with different associated interaction models.

Figure 2.3.2: A catalogue of ICOS computational objects by site

Several important computational objects required by each major 'site' in ICOS are described below along with the bindings for which interaction models will need to be specified.   This is a provisional survey, subject to further information about and refinement of the roles of each of the thematic centres and the Carbon Portal.

2.3.2 Thematic Centres

All three thematic centres (Atmospheric, Ecological and Ocean) possess instances of the same computational objects regardless of differences in their instrument networks and how they acquire observations from those networks. There may however be a different number of instances of a given object, and their binding behaviours may be different (for example, the ATC and ETC may have different numbers of instrument controllers, and may use different interaction models for a   configure instrument   binding).

The thematic centres are responsible for producing almost all scientific data within ICOS, and are able to operate autonomously from the ICOS Carbon Portal. However for the purposes of identifying key computational services that must be hosted by various sites, the assumption is that all external requests are filtered through the Carbon Portal; thus there may be instances of services (particularly related to data access) for which computational objects are not listed below because it is assumed that their function will be carried out by the Carbon Portal for all interactions originating from the Portal.

2.3.2.1 Acquisition

Field laboratory   : Each thematic centre provides an environment for deploying and calibrating instruments in their respective measurement networks; this is done mostly manually by scientists and technicians in the field however, with most interactions with instruments being physical rather than virtual.

Bindings: calibrate   [atmospheric/ecosystem/ocean] instrument, update   [atmospheric/ecosystem/ocean] registry.

Acquisition service   : Each thematic centre will have one or more acquisition services to handle the acquisition of data from measurement networks and ensure that L0 data is recorded within data stores within the centre; as with the field laboratory, some of this service's functionality may be carried out manually.

Bindings: configure   [atmospheric/ecosystem/ocean] controller, prepare   [atmospheric/ecosystem/ocean] transfer, update   [atmospheric/ecosystem/ocean] registry.

2.3.2.2 Curation

Catalogue service   : Each thematic centre will catalogue its L0, L1 and L2 data and store associated metadata. A catalogue service will be required at each thematic centre to catalogue the centre's complete data corpus and preserve the link between metadata and data.

Bindings: archive   [L0/L1/L2] data, collect L0   [atmospheric/ecosystem/ocean] data, derive [L1/L2] [atmospheric/ecosystem/ocean] data,   export [L0/L1] [atmospheric/ecosystem/ocean]   data, query   [atmospheric/ecosystem/ocean] data, query   [atmospheric/ecosystem/ocean] resource.

Data store controller   : Each thematic centre stores L0, L1 and L2 data as well as associated metadata. Data store controllers will be necessary for all types of data stored, with different interaction models for data entry and access for different types of data store.

Bindings: archive [L0/L1/L2] data, collect L0   [atmospheric/ecosystem/ocean] data,   derive [L1/L2] [atmospheric/ecosystem/ocean] data,   export [L0/L1] [atmospheric/ecosystem/ocean] data, process [L0/L1] [atmospheric/ecosystem/ocean] data, query   [atmospheric/ecosystem/ocean] resource.

Data transfer service   : Each thematic centre is responsible for serving L0 and L1 data on request, as well as sending any L2 data generated to the Carbon Portal for preservation. Each thematic centre may provide multiple data transfer services to handle different classes of data request, or have one data transfer service to manage all transfers.

Bindings: prepare [L0/L1/L2] [atmospheric/ecosystem/ocean] data archival,   prepare L0 [atmospheric/ecosystem/ocean] data collection, prepare [L0/L1] [atmospheric/ecosystem/ocean] data export,   prepare   [L0/L1] [atmospheric/ecosystem/ocean] data staging, prepare [L1/L2] [atmospheric/ecosystem/ocean] result transfer.  

2.3.2.3 Processing

Coordination service   : Each thematic centre is capable of deriving L1 data from L0 data, and L2 data from L1 data. The role of the coordination service is to manage the processing of data by arranging the staging of data onto processing resources and the reclamation of results; each thematic centre should have a coordination service to coordinate the derivation of new datasets.

Bindings: coordinate   [atmospheric/ecosystem/ocean] process,   prepare   [L0/L1] [atmospheric/ecosystem/ocean] data staging, prepare [L1/L2] [atmospheric/ecosystem/ocean] result transfer, request [atmospheric/ecosystem/ocean] process.

Experiment laboratory   :   Each thematic centre provides an environment for systematic processing of L0 data in order to produce L1 data as well as L2 data from L1 data; an experiment laboratory encapsulates the functions required to describe and request computational processes that can be used to derive higher-level datasets from lower-level ones. As such, each thematic centre can be stated to possess at least one experiment laboratory.

Binding: request   [atmospheric/ecosystem/ocean] process.

Process controller   : Each thematic centre performs systematic processing of acquired data. A process controller provides an interface to a process, allowing data to be staged, processed, and the results stored. There should be process controllers present at each thematic centre representing their respective data derivation processes.

Bindings:   coordinate   [atmospheric/ecosystem/ocean] process,   derive [L1/L2] [atmospheric/ecosystem/ocean] data,   process [L0/L1] [atmospheric/ecosystem/ocean] data.

2.3.2.4 Community

PID service   : ICOS will require a mechanism by which to assign persistent identifiers to datasets. If these PIDs are to be globally distinguishable, then it will likely be necessary to use an external PID handling service. If the PIDs need only be distinguishable within the ICOS context, then it may be sufficient to host a PID service in the Carbon Portal, or it may be expedient to host PID services at each thematic centre.

Bindings:   collect L0   [atmospheric/ecosystem/ocean] data, derive [L1/L2/L3] [atmospheric/ecosystem/ocean] data.

2.3.3 Measurement Station Networks

Each theme has a network of measurement stations associated with it that provides their respective thematic centre with L0 data.

Instrument controller   :   An instrument is considered   computationally   to be any source of observation data deployed 'in the field'. An instrument controller object encapsulates the computational functions required to interact with an instrument and acquire data from it. In ICOS, instruments may correspond to measurement stations rather than individual sensors installed within stations if computationally the stations act as a single interactive entity. The chosen fidelity for instruments need not be the same for all themes however. Regardless, each theme will have a number of instrument controllers that interact with services in the respective thematic centre.

Bindings: calibrate [atmospheric/ecosystem/ocean]   instrument, configure [atmospheric/ecosystem/ocean]   controller, collect L0   [atmospheric/ecosystem/ocean] data.

2.3.4 Carbon Portal

The Carbon Portal provides a gateway into the ICOS research infrastructure and provides access to L2 and L3 data along with visualisation services.

2.3.4.1 Curation

Annotation service   : The role of an annotation service is to provide a mechanism to add or edit the metadata associated with a dataset, as well as add generic user annotations to data if supported. The Carbon Portal may provide annotation services for L2 and L3 data for example.

Bindings: annotate [L2/L3] data, annotate metadata, update [L2/L3] catalogues, update [L2/L3] records, update metadata records.

Catalogue service   : The Carbon Portal will require a catalogue service to catalogue all L2 and L3 data and their associated metadata. Such a service will likely have access to catalogues of L0 and L1 data stored at all three thematic centres as well. Usage statistics and data analyses preserved on-site may also require cataloguing, depending on their complexity and whether or not they are stored permanently.

Bindings: archive   [L0/L1/L2/L3] [atmospheric/ecosystem/ocean]   data, export   [L2/L3] data, import L3 data, query CP data, query CP resource.

Data store controller   : The Carbon Portal   stores L2 and L3 data as well as associated metadata and usage statistics. Data store controllers will be necessary for all types of data stored, with different interaction models for data entry and access for different types of data store.

Bindings: archive [L0/L1/L2] [atmospheric/ecosystem/ocean]   data, export   [L2/L3] data, import L3 data, query resource.

Data transfer service   : The Carbon Portal must be able to provide L2 and L3 datasets on request, as well as be able to import certain external datasets into the infrastructure. Data is also transferred between thematic centres and the Carbon Portal (though whether the data transfer service at the Carbon Portal or the transfer service at the thematic centres handles this task is a matter or architectural convenience).

Bindings: prepare   [L0/L1/L2]   [atmospheric/ecosystem/ocean] data archival, prepare   [L2/L3] data export, prepare L3 data import.

2.3.4.2 . Access

Data broker   : One of the Carbon Portal's primary roles is to provide access to a range of scientific datasets. The role of the data broker is validate data requests and identify where data is stored, as well as authorise any resulting data transfers. The Carbon Portal may host a number of data brokers for different kinds of data request or query, or may integrate them all into one service.

Bindings: annotate [L2/L3] data, perform data query, prepare [L0/L1/L2] [atmospheric/ecosystem/ocean] data archival,   prepare   [L0/L1] [atmospheric/ecosystem/ocean] data export, query [atmospheric/ecosystem/ocean] data, request [L0/L1] data export, request [L2/L3] data export, request L3 data import,   query CP data.

2.3.4.3 Community

PID service   : ICOS will require a mechanism by which to assign persistent identifiers to datasets. If these PIDs are to be globally distinguishable, then it will likely be necessary to use an external PID handling service. If the PIDs need only be distinguishable within the ICOS context, then it may be sufficient to host a PID service in the Carbon Portal, or it may be expedient to host PID services at each thematic centre.

Binding:   import L3 data.

Science gateway   : A science gateway is a service offering access to the rest of the infrastructure. The Carbon Portal exists to provide such a gateway.

Science gateways are capable of creating new virtual laboratories for users.

Security service   : A security service is required to validate user requests and verify identity. The level of security (or identity management, if preferred) required depends on the scope and complexity of services offered by the Carbon Portal in practice.

  • Binding: authorise action.

Virtual laboratory   : Another primary role of the Carbon Portal is to provide a virtual research environment for investigating ICOS data. This environment may be simply a means to download, upload and visualise datasets, or may provide more elaborate services (such as user accounts, processing privileges for virtual organisations,   etc .). A virtual laboratory object encapsulates the services provided to a given user or set of users in a given context (such as a browser session).

  • Bindings: authorise action, perform data query,   request [L0/L1] data export, request [L2/L3] data export, request L3 data import.

2.3.5 Core bindings

The following bindings require interaction models to be defined within the ICOS infrastructure to be compliant with the ENVRI Reference Model. Many of the binding descriptions below describe multiple similar but distinct bindings (for example archive   [L0/L1/L2]   [atmospheric/ecosystem/ ocean] data describes 9 different bindings in total)   – in principle, an interaction model is required for   each   individual binding (though many or all of those models may be nearly identical). Most bindings are primitive bindings between two instances of the computational objects described above, but some compound bindings are defined as well; these compound bindings have links to dedicated subsections below. Compound bindings generally combine multiple primitive bindings together to create a single interaction; a single unified interaction model can be defined for each compound binding, or individual models can be defined for primitive sub-bindings and composed as deemed most appropriate.

In ICOS, a significant number of bindings are to different thematic centres or involve different levels of dataset (or both). Because different thematic centres may organise themselves differently, and different data policies may apply to different levels of data, different interaction models may apply to different cases of what is otherwise, to the Reference Model, the same abstract interaction; hence the proliferation of nearly-identical types of binding.

archive   [L0/L1/L2]   [atmospheric/ecosystem/ocean] data   : Used to export   [L0/L1/L2] datasets from the   [ATC/ETC/OTC] to the Carbon Portal's own archives. See   archive   [L0/L1/L2] data .

authorise action   : Used to retrieve authentication tokens required to authorise a variety of actions across the infrastructure. A user should invoke authorise action before almost any other action.

  • client   : Any virtual laboratory on behalf of an agent wishes to interact with the Carbon Portal.
  • interface   : authorise action
  • server   : The Carbon Portal security service.

annotate [L2/L3] data   : Used to edit or annotate   [L2/L3] data held by the Carbon Portal.

  • client   : Any Carbon Portal data broker acting on behalf of an authorised agent.
  • interface   : annotate data
  • server   : The annotation service provided by the Carbon Portal for the agent that wishes to perform the edit/annotation.

annotate metadata   : Used to edit metadata held within the Carbon Portal metadata repository.

  • client   : Any   Carbon Portal data broker acting on behalf of an authorised agent.
  • interface   : annotate data
  • server   : The annotation service provided by the Carbon Portal for the agent that wishes to perform the edit/annotation.

calibrate [atmospheric/ecosystem/ocean] instrument   :   Used to monitor and calibrate instruments in the [atmospheric/ecosystem/ocean] theme's measurement station network.

  • client   : Any field laboratory in the [ATC/ETC/OTC].
  • interface   : calibrate instrument
  • server   : Any instrument controller in the [atmospheric/ecosystem/ocean] theme measurement station network.

collect L0 [atmospheric/ecosystem/ocean]   data   : Used to retrieve L0 data from instruments and store them in the   [ATC/ETC/OTC]. See   collect L0 data .

configure [atmospheric/ecosystem/ocean] controller   : Used to control how and when data an instrument sends data to the [ATC/ETC/OTC].

  • client   : The acquisition service in the [ATC/ETC/OTC]   associated with the instrument(s) to be configured.
  • interface   : configure controller
  • server   : The instrument controller associated with the instrument(s) to be configured.

coordinate   [atmospheric/ecosystem/ocean] process   : Used to configure, run and monitor a processing task on a processing resource.

  • client   : The coordination service in the   [ATC/ETC/OTC] responsible for managing the process in question.
  • interface   : coordinate process
  • server   : The process controller associated with the process being executed.

derive [L1/L2] [atmospheric/ecosystem/ocean]   data   : Used to derive higher-level data from datasets held by the   [ATC/ETC/OTC]. See   derive [L1/L2] data .

export [L0/L1] [atmospheric/ecosystem/ocean] data   : Used to export [L0/L1] datasets from the   [ATC/ETC/OTC] to an external resource. See   export [L0/L1] data .

export   [L2/L3] data   : Used to export   [L2/L3] datasets from the Carbon Portal to an external resource. See   export   [L2/L3] data .

import L3 data   : Used to upload L3 datasets from an external resource into the Carbon Portal. See   import L3 data .

perform data query   : Used to request that a data broker perform a query over the aggregate data held by the ICOS infrastructure.

  • client   : The virtual laboratory used by the agent making the request.
  • interface   : data request
  • server   : The Carbon Portal data broker responsible for brokering queries over Carbon Portal data.

prepare [L0/L1/L2] [atmospheric/ecosystem/ocean] data archival   : Used to initiate the transfer of   [L0/L1/L2] data stored in the   [ATC/ETC/OTC] to the archives of the Carbon Portal.

  • client   : The Carbon Portal data broker responsible for replicating   [ATC/ETC/OTC] [L0/L1/L2/L3] data.
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the transfer of   [L0/L1/L2/L3] datasets from the   [ATC/ETC/OTC] to the Carbon Portal.

prepare L0 [atmospheric/ecosystem/ocean] data collection   : Used to initiate the transfer of L0 data from instruments in the [atmospheric/ecosystem/ocean] measurement station network to the [ATC/ETC/OTC].

  • client   : The acquisition service in the [ATC/ETC/OTC]   associated with the instrument(s) providing data.
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the collection of L0 data for the [ATC/ETC/OTC].

prepare   [L0/L1] [atmospheric/ecosystem/ocean] data export   : Used to initiate the export of   [L0/L1] data stored in the   [ATC/ETC/OTC] to an external resource.

  • client   : The Carbon Portal data broker responsible for requesting   [L0/L1] data stored in the   [ATC/ETC/OTC].
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the export of   [L0/L1] data from the   [ATC/ETC/OTC].

prepare [L0/L1/L2] [atmospheric/ecosystem/ocean] data staging   : Used to initiate the staging of   [L0/L1/L2] data stored in the   [ATC/ETC/OTC] into a suitable context for processing.

  • client   : The coordination service responsible for deriving higher-level data from [L0/L1/L2] datasets held in the [ATC/ETC/OTC].
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the staging of   [L0/L1/L2] datasets held in the   [ATC/ETC/OTC].

prepare   [L1/L2] result transfer   : Used to initiate the retrieval of new   [L1/L2] processed data from processing facilities to be stored in the   [ATC/ETC/OTC].

  • client   : The coordination service responsible for deriving [L1/L2] data from lower-level datasets held in the [ATC/ETC/OTC].
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the retrieval of   [L1/L2] datasets to be held in the   [ATC/ETC/OTC].

prepare   [L2/L3] data export   : Used to initiate the export of   [L2/L3] data stored in the Carbon Portal's archives to an external resource.

  • client   : The Carbon Portal data broker responsible for fielding requests for [L2/L3] data.
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the export of   [L2/L3] data from the Carbon Portal.

prepare L3 data import   : Used to initiate the upload of L3 data from an external resource to the Carbon Portal's archives.

  • client   : The Carbon Portal data broker responsible for fielding requests to upload L3 data.
  • interface   : prepare data transfer
  • server   : The data transfer service responsible for managing the import of L3 data into the Carbon Portal.

process [L0/L1] [atmospheric/ecosystem/ocean]   data   : Used to stage scientific data held by the   [ATC/ETC/OTC]   for analysis and processing. See   process [L0/L1/L2] data .

query [atmospheric/ecosystem/ocean] data   : Used by the Carbon Portal to query the aggregate data held by the   [ATC/ETC/OTC].

  • client   : Any Carbon Portal data broker.
  • interface   : query data
  • server   : The [ATC/ETC/OTC]   catalogue service.

query [atmospheric/ecosystem/ocean] resource   : Used by the catalogue service within the [ATC/ETC/OTC] to query   the data held within a specific data store within the   [ATC/ETC/OTC].

  • client   : The   [ATC/ETC/OTC]   catalogue service.
  • interface   : query resource
  • server   : The data store controller within the [ATC/ETC/OTC]   that holds the desired data to be queried.

query CP data   : Used by the Carbon Portal to query the aggregate data held within its own archives.

  • client   : Any Carbon Portal data broker.
  • interface   : query data
  • server   : The Carbon Portal catalogue service.

request [atmospheric/ecosystem/ocean] process   : Used to request that a particular data processing task be executed within the   [ATC/ETC/OTC].

  • client   : The experiment laboratory providing the environment for data processing within the   [ATC/ETC/OTC].
  • interface   : process request
  • server   : Any coordination service present within the   [ATC/ETC/OTC] capable of executing the given process.

request   [L0/L1] data export   : Used to request the export of an [L0/L1] dataset from the relevant thematic centre (performed via the Carbon Portal; may be implemented as a manual request).

  • client   : The virtual laboratory used by the agent making the request.
  • interface   : data request
  • server   : The Carbon Portal data broker responsible for brokering access to   [L0/L1] data.

request   [L2/L3] data export   :   Used to request the export of an [L2/L3] dataset from the Carbon Portal's archives.

  • client   : The virtual laboratory used by the agent making the request.
  • interface   : data request
  • server   : The Carbon Portal data broker responsible for brokering access to   [L2/L3] data.

request L3 data import   : Used to request permission to upload L3 synthesis data to the Carbon Portal's archives.

  • client   : The virtual laboratory used by the agent making the request.
  • interface   : data request
  • server   : The Carbon Portal data broker responsible for brokering uploads of syntheses to the Carbon Portal.

update   [L2/L3] catalogues   : Used to perform annotation updates on catalogues managed by the Carbon Portal catalogue service (systematic updates are handled as part of data transfers such as for   archival ).

  • client   : The Carbon Portal annotation service.
  • interface   : update catalogues
  • server   :   The catalogue service used by the Carbon Portal to manage   [L2/L3] data catalogues.

update [L2/L3] records   : Used to perform annotation updates to data recorded in an   [L2/L3] data store within the Carbon Portal's archives (systematic updates are handled as part of data transfers such as for   archival ).

  • client   : The Carbon Portal annotation service.
  • interface   : update records
  • server   : The [L2/L3] data store controller used by the Carbon Portal to control the data store containing the data to be annotated.

update   metadata records   : Used to perform updates to metadata recorded in the Carbon Portal's metadata repository.

  • client   : The Carbon Portal annotation service.
  • interface   : update records
  • server   : The data store controller used by the Carbon Portal to control its metadata repository.

update [atmospheric/ecosystem/ocean] registry   :   Used to register and unregister instruments deployed in the [atmospheric/ecosystem/ocean] theme's measurement station network.

  • client   : Any field laboratory in the [ATC/ETC/OTC].
  • interface   : update registry
  • server   : The acquisition service in the [ATC/ETC/OTC] with which the instrument(s) involved are registered / to be registered.

2.3.6 Compound Bindings

Compound bindings are used to bind three or more computational objects by means of an intermediary binding object. The binding object is responsible for coordinating interaction between the bound computational objects. Those core bindings described above that have been deemed to be compound bindings are described in more detail here. Note that these descriptions merely explicate the set of objects being bound in each case and their purpose; just as for the primitive bindings, a compliant infrastructure must define interaction models for each compound binding.

The compound binding descriptions below are deliberately pedantic in how they distribute oversight, control and data-flow between objects. However in practice, many of the separate objects may be collapsed together to simplify the interaction supported; for example the staging of data from a permanent data store to a processing platform, with the results then moved back into another data store, might simplify to just performing data processing in-situ within one data store without any data movement.

2.3.6.1 archive   [L0/L1/L2] data

The archival of L0, L1 and L2 data produced by any of the thematic centres by the Carbon Portal binds a thematic data store to a designated Carbon Portal data store. The binding occurs at two levels: at the operational level, data is requested via the thematic data store controller's   retrieve data   interface   and the CP data store's internal records are updated via its controller's   update records   interface; at the data streaming level, a data channel is set up to transfer curated data. Additional metadata is retrieved by the relevant thematic catalogue service   whilst the CP's catalogue service updates the its catalogue of L0, L1 or L2 datasets.

Either the relevant thematic centre's data transfer service or the CP's data transfer service creates the   data transporter   binding object necessary to carry out the interaction. All datasets of all levels should be archived for data preservation purposes (via a   prepare data transfer   binding), even though only L2 and L3 data is directly served by the Carbon Portal.

2.3.6.2 collect L0 data

The collection of L0 data from any of the thematic measurement station networks binds a instrument controller to a data store within the corresponding thematic centre. The binding occurs at two levels: at the operational level, data is requested via the instrument controller's   retrieve data   interface and the data store's internal records are updated via its controller's   update records   interface; at the data streaming level, a data channel is set up to deliver raw data for curation.   A persistent identifier is acquired from the thematic PID service and the thematic catalogue service updates the corresponding thematic centre's catalogue of L0 datasets.

The relevant thematic centre's data transfer service creates the   data transporter   binding object necessary to carry out the interaction. The data transfer service will only set up a data channel between an instrument and a data store if the relevant acquisition service requests it (via a   prepare data transfer binding).

2.3.6.3 derive [L1/L2] data

The derivation of L1 or L2 data within any of the thematic centres binds a process controller to an L1 or L2 data store respectively.   The binding occurs at two levels: at the operational level, derived data is requested via the process controller's   retrieve data   interface and the data store's internal records are updated via its controller's   update records   interface; at the data streaming level, a data channel is set up to deliver the derived data for curation.   A persistent identifier is acquired from the thematic PID service and the thematic catalogue service updates the corresponding thematic centre's catalogue of L1 or L2 datasets.

The relevant thematic centre's data transfer service creates the   data transporter   binding object necessary to carry out the interaction. The data transfer service will only set up a data channel between a process controller and a data store if the relevant coordination service requests it (via a   prepare data transfer   binding). The derivation of any level of data is precluded by the   processing of data of the previous level .   Note that if data has been processed within the data store it resides, then this binding can be implemented trivially.

2.3.6.4 export [L0/L1] data

The export of L0 or L1 data from any of the thematic centres binds a thematic data store to a designated data store outside of the ICOS infrastructure. The binding occurs at two levels: at the operational level, data is requested via the data store controller's   retrieve data   interface; at the data streaming level, a data channel is set up to export curated data. Additional metadata is retrieved by the thematic catalogue service.

The relevant thematic centre's data transfer service creates the   data transporter   binding object necessary to carry out the interaction. The data transfer service will only export L0 or L1 data on request via the Carbon Portal (via a   prepare data transfer   binding   or via direct request to the thematic centre; but this is auxiliary to ICOS).

2.3.6.5 export [L2/L3] data

The export of L2 or L3 data from the Carbon Portal binds a data store in the Carbon Portal's archives to a designated data store outside of the ICOS infrastructure. The binding occurs at two levels: at the operational level, data is requested via the archive data store controller's   retrieve data   interface; at the data streaming level, a data channel is set up to export curated data. Additional metadata is retrieved by the Carbon Portal catalogue service .

The Carbon Portal's data transfer service creates the   data transporter   binding object necessary to carry out the interaction. The data transfer service will only export L2 or L3 data upon a valid request being made (via a   prepare data transfer   binding).

2.3.6.6 import L3 data

The upload of externally-produced L3 syntheses into the ICOS infrastructure binds an external resource to a data store within the Carbon Portal's archives. The binding occurs at two levels: at the operational level, the Carbon Portal data store's internal records are updated via its controller's   update records   interface; at the data streaming level, a data channel is set up to import data for curation.   A persistent identifier is acquired from the Carbon Portal's PID service and the Carbon Portal's catalogue service updates the Carbon Portal's catalogue of L3 syntheses .

The Carbon Portal's data transfer service creates the   data transporter   binding object necessary to carry out the interaction. The data transfer service will only set up a data channel between an external resource and a data store if a valid upload request is made (via a   prepare data transfer   binding).

2.3.6.7 process [L0/L1] data

The processing of L0 or L1 data within any of the thematic centres binds a L0 or L1 data store respectively to a process controller.   The binding occurs at two levels: at the operational level, data is requested via the data store controller's   retrieve data   interface and the process controller's internal records are updated via its controller's   update records   interface; at the data streaming level, a data channel is set up to stage the L0 or L1 data .

The relevant thematic centre's data transfer service creates the   data stager   binding object necessary to carry out the interaction. The data transfer service will only set up a data channel between a data store and a processing context if the relevant coordination service requests it (via a   prepare data transfer   binding). The processing of any level of data precludes the   derivation of higher level data . Note that if data can be processed within the data store it resides, then this binding can be implemented trivially.

3        Conclusion

3.1    <...>

Lorem Ipsum

ApPendix

A How to read the Model (Computational Viewpoint)

The computational viewpoint prescribed by the Open Distributed Process is concerned with the modelling of computational objects and the interactions between their interfaces. The Reference Model uses a lightweight subset of the full ODP specification to model the abstract computational requirements of an archetypical environmental science research infrastructure.

 

        The encapsulation of computational objects (and interfaces) occurs at a conceptual level rather than the implementation level – it is perfectly admissible for the functions of a given object to be distributed across multiple computational resources in an implemented infrastructure, should that be supported by its architecture, if that distribution does not interfere with the ability to implement all of that object's interfaces (and thus behaviours). Likewise the functionalities of multiple objects can be gathered within a single implemented service, should that be desired.

 

The first-class entity of the computational viewpoint is the   computational object :

A computational object   encapsulates a set of functions that need to be collectively implemented by a service or resource within an infrastructure. To access these functions, a computational object also provides a number of   operational   interfaces by which that functionality can be invoked; the object also provides a number of operational interfaces by which it can itself invoke functions on other objects.   Each computational object may also have   stream   interfaces for ferrying large volumes of data within the infrastructure. In summary:

 

  • Operational interfaces   are used to pass messages between objects   used to coordinate general infrastructure operations such as querying a   data resource or configuring a service. A given operation interface must   be either a   server   interface (providing access to functions that can be invoked by other   objects) or a   client   interface (providing a means by which an object operations can be   invoked on other objects).

 

        In diagrams, client and server interfaces are linked using 'ball and   socket' notation: clients expose sockets (half-circles) whilst servers expose balls (closed-circles).

 

  • Stream interfaces   are used to deliver datasets from one part of the   infrastructure to another. A   producer   interface streams data to one or   more bound   consumer   interfaces as long as there is data to transfer   and all required consumers are available to receive that data (whether one, all or some of the consumers must be available depends on the circumstances of the data transfer). Data channels are   typically established by operations invoked via operational interfaces (which typically negotiate the terms of the transfer),   but can persist independently of them (which is useful for long-term continuous transfers such as from sensor networks to data stores).

 

        In diagrams, producer and consumer stream interfaces are linked using a double-arrow notation: the arrow-head points away from producers, towards consumers.

 

As well as having interfaces by which to interact with other objects, some computational objects possess the right to create other computational objects; this is done typically to deploy transitory services or to demonstrate how an infrastructure might extend its functionality.

Some objects extend the functionality of other objects; these objects possess all the interfaces of the parent (usually in addition to some of their own) and can be created by the same source object if the capability exists.

 

        In diagrams, the ability to create objects is noted by a single filled arrow extending from the creating object to the object being created, with the annotation 'new <object>'. If one object extends another, then this can be illustrated using an unfilled arrow from the sub-object to the parent, with the annotation 'is a'.

 

Each interface on a computational object supports a certain type of interaction between objects, which determine the bindings that can be made between interfaces. A   binding   is simply an established connection between two or more interfaces in order to support a specific interaction between two or more computational objects. A client operational interface can be bound to any server operational interface that provides access to the functions that the client requires. Likewise a producer stream interface can be bound to any consumer stream interface that can consume the data produced by the former.

 

        For simplicity, client and server interfaces designed to work together in the Model share the same name; thus a a client interface   x   can bind to any server interface   x   and a producer interface   y   can bind to any consumer interface   y . When a binding is explicitly shown in a diagram, the binding itself is identified by that shared name.

Once bound via their corresponding interfaces, two objects can invoke functions on one another to achieve some task (such as configuration of an instrument or establishment of a persistent data movement channel).

Primitive bindings can be established between any client / server pair or producer / consumer pair as appropriate. Compound bindings between three or more interfaces can be realised via the creation of   binding objects , a special class of transitory computational object that can be used to coordinate complex interactions by providing primitive bindings to all required interfaces.

The use of binding objects removes the imperative to decompose complex interactions into sets of pairwise bindings between objects; this suits the level of abstraction at which the Model is targeted, given that the specific distribution of control between interacting objects is often idiosyncratic to different infrastructure architectures.

 

        The names of binding objects are typically italicised in diagrams to better distinguish them from 'normal' computational objects.

A.1 A note about implementation

In principle, all computational objects and their interfaces can be implemented as services or agents within a service-oriented architecture – this   is not required however. Certain objects may be implemented by working groups or even individuals within the infrastructure organisation, bindings between their interfaces implemented by physical interactions, or otherwise human-oriented processes (such as sending data via email).

For example, in the Model, a   field laboratory   has the ability to calibrate instruments (represented by   instrument controllers ) via a binding of their common   calibrate instrument   interfaces. Potentially, the field laboratory could be implemented by a virtual research environment within which authorised users can interact online with instruments deployed in the field, modifying how they acquire data. In practice, the `field laboratory' may simply abstractly represent the activities of field agents (scientists and technicians) who actually travel to sites where instruments are deployed and manually make adjustments.

This possibility of this kind of 'human-driven' implementation of interactions between computational objects should be accounted for when considering the 'computational' viewpoint of a research infrastructure.

B How to use the Model (Computational Viewpoint)

The computational viewpoint of the Model identifies a standard set of components and interfaces from which can be derived a standard set of interactions that a research infrastructure design should address. The Model does   not   specify how those interactions should be implemented –   indeed, over the course of the lifetime of a research infrastructure, implementations may change. Nevertheless, the set of the most important interactions should remain constant regardless of implementation changes.

Someone trying to apply the Computational Viewpoint of the Model to their existing or planned research infrastructure should conduct two primary activities: mapping agents and services to computational objects, and defining the interactions that should occur when two or more interfaces are bound together.

For each computational object in the Model, there should be at least one component or service (or group thereof) provided by the infrastructure that can provide the functions described –   depending on the architecture of the infrastructure, there may be multiple candidate, particularly for federated infrastructures. Every such candidate could provide an instantiation of the given object. If no candidates exist, then either (a) the infrastructure does not provide the service embodied by the computational object (and it should be clearly understood that this is indeed the case) or (b) the infrastructure is missing functionality that should be implemented to bring it in compliance with the Model.

For each compatible pair of interfaces (operational or stream), there exists an interaction that should occur given a binding between those two interfaces. The Model does   not   prescribe these interactions, instead simply providing the means to identify them. A compliant research infrastructure should in principle have a well-defined description for   every   possible binding between interfaces on objects that it provides an implementation for.

In the above diagram, an (operational) primitive binding has been established between the   configure instrument   interfaces of an   acquisition service object and an   instrument controller   object, as well as a (stream) primitive binding between the   deliver raw data   interfaces of the   acquisition service   and a   data store controller   (see   how to read the Model   to understand the above notation and terms). Thus, assuming a Model-compliant research infrastructure that provides at least one acquisition service and instrument controller, there should be a specification of what happens when a 'configure instrument' binding occurs between an acquisition service and instrument controller. Likewise, there should be a specification of how raw data is delivered from an instrument (represented by the instrument controller) to a data store (represented by its own controller).  

Many   primitive   (two-interface) bindings are linked in that the establishment of one binding will necessarily lead to the establishment of other bindings, implying a unified interaction description. This is particularly true for   compound   bindings where a particular binding object is created to establish pairwise primitive bindings with multiple computational objects that must all contribute to the given interaction. A compliant research infrastructure must therefore identify all such compound bindings and should define how any binding objects created to coordinate interactions are instantiated (generally as either an oversight service or as 'abstractly' as a distributed process involving agents / services participating in the resulting interaction.

In the above diagram, there exist multiple primitive bindings to a central binding object (the   raw data collector ) that nonetheless all relate to a single compound interaction (describing how the transfer of data from an instrument to a data store is configured and managed). It is very important to properly describe the relationship between the individual bindings and how the compound interaction between the various computational objects involved is produced if constructions like in the diagram above are to be properly understood. In the reference material for the Model, a number of 'core' reference interactions have been described informally to provide a   startin g point   for Model implementors.

Interaction specifications (whether for primitive or compound interface bindings) can take any form deemed suitable by the developers of the infrastructure   – for example, UML diagrams such as activity or sequence diagrams may be appropriate, as might be a formal logic model or BPEL workflow, or even natural language if the interaction is simple enough.

 

4        References

R 1

 

R 2

 

R 3

 

R 4

 

R 5

 

R 6

 

...