페이지 트리

 

 

 

ENVRI

Services for the Environmental Community

 

 

 

Using the Reference Model in

ICOS Research Infrastructure Design Study

 

 

Document identifier:

Using the Reference Model in ICOS RI Design

Date:

17/09/2014

Activity:

WP3

Lead Partner:

CU

Document Status:

[DRAFT]

Dissemination Level:

PUBLIC

Document Link:

 

 

ABSTRACT

 

The Integrated Carbon Observation System, ICOS, is a pan-European research infrastructure for quantifying and understanding the greenhouse gas balance of the European continent and adjacent regions. The design challenge of ICOS is to integrate atmospheric, ecosystem and oceanic observations from highly distributed and heterogeneous national measurement networks, and to provide a comprehensive Carbon Portal that supports seamless data access and process . Since January 2014, the ENVRI Reference Model team has been supported ICOS to examine the requirements and optimise the design using the Reference Model concepts and framework. This report is an update of the understanding of the requirements of the ICOS research infrastructures resulting of the Amsterdam workshop discussions on 10 Sep 2014. Using the concepts of the Reference Model Science Viewpoint, this report provides the analysis of the ICOS community roles, behaviours, and processing workflow.

 


  1. Copyright notice

Copyright © Members of the ENVRI Collaboration, 2011. See www.ENVRI.eu for details of the ENVRI project and the collaboration. ENVRI (“ Common Operations of Environmental Research Infrastructures ”) is a project co-funded by the European Commission as a Coordination and Support Action within the 7th Framework Programme. ENVRI began in October 2011 and will run for 3 years. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, and USA. The work must be attributed by attaching the following reference to the copied elements: “Copyright © Members of the ENVRI Collaboration, 2011. See www.ENVRI.eu for details of the ENVRI project and the collaboration”. Using this document in a way and/or for purposes not foreseen in the license, requires the prior written permission of the copyright holders. The information contained in this document represents the views of the copyright holders as of the date such views are published.

  1. Document Log

Issue

Date

Comment

Author/Partner

1.0

02 Jun. 14

Analysis of ICOS RI from Science, Information, and computational Viewpoint, first draft

Yin Chen (CU),

Barbara Magagna(EAA),

Paul Martine (UEDIN),

ICOS RI

2.0

17 Sep. 14

Updates of Science Viewpoint

Yin Chen(CU) & ICOS RI

3.0

 

 

 

4.0

 

 

 

 

  1. Application area

This document is a formal deliverable for the European Commission, applicable to all members of the ENVRI project, beneficiaries and Joint Research Unit members, as well as its collaborating projects.

  1. Document amendment procedure

Amendments, comments and suggestions should be sent to the authors.

  1. Terminology

A complete project glossary is provided at the following page: http://www.ENVRI.eu/glossary .

ENVRI Reference Model terminology is provided at the wiki site: http://www.envri.eu/rm

 


  1. ENVRI PROJECT SUMMARY

Frontier environmental research increasingly depends on a wide range of data and advanced capabilities to process and analyse them. The ENVRI project, “Common Operations of Environmental Research infrastructures” is a collaboration in the ESFRI Environment Cluster, with support from ICT experts, to develop common e-science components and services for their facilities. The results will speed up the construction of these infrastructures and will allow scientists to use the data and software from each facility to enable multi-disciplinary science.

 

The target is on developing common capabilities including software and services of the environmental e-infrastructure communities. While the ENVRI infrastructures are very diverse, they face common challenges including data capture from distributed sensors, metadata standardisation, management of high volume data, workflow execution and data visualisation. The common standards, deployable services and tools developed will be adopted by each infrastructure as it progresses through its construction phase.

 

Two use cases, led by the most mature infrastructures, will focus the development work on separate requirements and solutions for data pre-processing of primary data and post-processing toward publishing.

 

The project will be based on a common reference model created by capturing the semantic resources of each ESFRI-ENV infrastructure. This model and the development driven by the test-bed deployments result in ready-to-use systems which can be integrated into the environmental research infrastructures.

 

The project puts emphasis on synergy between advanced developments, not only among the infrastructure facilities, but also with ICT providers and related e-science initiatives. These links will facilitate system deployment and the training of future researchers, and ensure that the inter-disciplinary capabilities established here remain sustainable beyond the lifetime of the project.

 

 


TABLE OF CONTENTS

1 Introduction .........................................................................................................

2 ENVRI REFERENCE MODEL .......................................................................

2.1 ODP approach ....................................................................................................

2.2 The Reference Model .......................................................................................

2.2.1 Science Viewpoint .........................................................................................

2.2.2 Information Viewpoint ..................................................................................

2.2.3 Computational Viewpoint .............................................................................

3 Analysis of ICOS Research Infrastructure ...................................

3.1 ICOS RI Roles

3.2 ICOS RI Communities Behaviours

3.3 ICOS RI Workflows

4 Conclusion .............................................................................................................

5 References ............................................................................................................

1        Introduction

The Integrated Carbon Observatory System, ICOS, is built to enable research to understand the greenhouse gas budgets and perturbations. New Carbon Portal is being designed and envisioned as a virtual data centre, to provide a single access point for environmental scientists to discover, obtain, visualise and track observation measures produced from the observation stations as quick as possible.

 

The design of the ICOS Carbon Portal is challenged by the complicated requirements of dataflow from the acquisition of the measures to the processing and publication of the data products. The ICOS national observation stations are highly distributed; data are semantically diverse, processes are different from nation to nation, organisation to organisation; measurements are varied from experiments to experiments.

 

Since January 2014, the ENVRI Reference Model team has been supported ICOS to examine the requirements and optimise the design using the Reference Model concepts and framework. The Reference Model has been firstly introduced to the ICOS architects. On 27 th -28 th Jan 2014, a ICOS-Reference Model workshop was held in Cardiff, providing the training of the Reference Model, and assisting the ICOS architects to analyse the design of the Carbon Portal. The Reference Model contributes to the ICOS system design by simplifying the design problem, breaking it down by subsystem, providing a uniform framework with   well-defined subsystems of components specified from different complementary viewpoints (Science, Information and Computation), which promotes structural thinking in the construction of system architectures. This use of the Reference Model enables designers to deliver a practical architecture that leads to concrete implementations. The initial benefit/cost analysis shows that using the Reference Model, the design cost of the ICOS Carbon Portal could be reduced, and future additions to the ICOS can be more easily implemented.

 

The positive feedback from the ICOS architects led to the communications between the ICOS head office. On 13 March, in a meeting held in Helsinki,  the Reference Model was able be presented to the ICOS director and the head office, who finally decided and pushed the adoption of the Reference Model within the community. Shortly after, the Reference Model team organised training events to the ICOS community, and delivered comprehensive analysis and design specification for the ICOS system, including the explicit definitions of the ICOS community, responsibilities of each role, specification of data lifecycles and actions of the data step by step, computation model, service interfaces and interactions.

 

On 4 Jun 2014, a workshop is held in London Heathrow welcomed all key organisations of ICOS Research Infrastructure representing the ICOS Head Office, Thematic Centres (TC), Central Analytical Labs (CAL), and Carbon Portal(CP). The analysis results of the ICOS system using the Reference Model was presented to the community, and received   encouraging   feedback. As the director of ICOS, Werner,   said, the Reference Model helped ICOS   clarify own thinking, identified many important issues, and   increased internal cooperation.  

 

As a follow-up action, a second workshop was organised on 10th September, Amsterdam, where Reference Model team was invited for further discussion and collaborations. The workshop reviewed the operation workflows in different thematic centres and the Central Analytical Lab. Using the Reference Model concepts and principles, the ICOS stakeholders, ETC, ATC, and CAL, are able to provide detailed workflows which explicitly describe the process steps from data collection, quality checking, data archiving, to the publication of the ICOS data products. The discussions also identified important design issues need to be resolved, e.g. what PID mechanism to be used; how many PIDs ICOS needs; what (data or metadata) needs to be pointed; what is L0, L1 and L2 data, how states changes as the results of operational actions, etc.

 

This report is an update of the understanding of the requirements of the ICOS research infrastructures resulting of the Amsterdam workshop discussions. Using the concepts of the Reference Model Science Viewpoint, this report provides the analysis of the ICOS community roles, behaviours, processing workflow. After the analysis of the Science Viewpoint being approved, the updates of the Information Viewpoints (describing the data objects and lifecycles in the infrastructure), can be provided, which will be included in a later version of the report.

 

The rest of the report is arranged as follows, after a brief review of the ENVRI Reference Model in section 2, the analysis of the ICOS research Infrastructure (from the aspects of the Science Viewpoint) is provided in section 3. Section 4 concludes the report.

2         ENVRI REFERENCE MODEL

The ENVRI Reference Model (ENVRI-RM) is a common ontological framework and standard for the description and characterisation of computational and storage infrastructures in order to achieve   seamless interoperability between the heterogeneous resources of different infrastructures. Fundamentally the model serves to provide a universal reference framework for discussing many common technical challenges facing all of the ESFRI-environmental research infrastructures. By drawing analogies between the reference components of the model and the actual elements of the infrastructures (or their proposed designs) as they exist now, various gaps and points of overlap can be identified.

 

The ENVRI Reference Model is based on the design experiences of state-of-the-art environmental research infrastructures, with a view to inform future implementation. It tackles multiple challenging issues encountered by many existing initiatives, such as: data streaming and storage management; data discovery and access to distributed data archives; linked computational, network and storage infrastructure; data curation, data integration, harmonisation and publication; data mining and visualisation, and scientific workflow management and execution. It uses Open Distributed Processing (ODP) approach (ISO/IEC 10746-1, 1998), which is an international standard for distributed system specification.

 

The Reference Model provides a predefined professional framework that clearly defines roles and processes in operations of ESFRI environmental research infrastructures. The immediate benefits of adopting the Reference Model (1-5 year timescale) are:

  • The Reference Model makes it easier to design Research Infrastructure (RI) in the construction phase, and helps to evaluate current RI for division of tasks and finding missing or duplicated actions within the RI work.
  • Easier definition of requirements of ICT components, enabling a more modular approach for the research infrastructure’s ICT solutions, and making it possible to use external suppliers (e.g. international ICT co-operation projects) for the component development.

On an intermediate timescale (5-10 years) the benefits of using the Reference Model are:

  • Development of a common language (taxonomy of terms, concepts and definitions) and a common understanding, which also facilitates communication with external communities. This enables RIs to understand each other’s operations, makes the roles of individuals working in the different RIs clear, and facilitates better reutilization of the RI products.
  • Avoiding duplication of work by identifying common missing components or components needing improvement. Common interfaces between different aspects of RI work can make some of the solutions, especially in the ICT parts of the RI, interchangeable.
  • Enabling re-use of components, solutions and policies. This enables easier planning for emerging RIs, providing them with a “cookbook” of working solutions, and facilitates direct connection to other RIs.
  • The use of a standard, modular approach provides RIs with scalable design solutions, where the parts of the Reference Model needed can be further detailed for the needs of each RI, while still keeping the overall benefits of common approach.
  • Better risk management of RI development, due to the possibility of changing individual modules and operations of the RIs, without needing to completely redesign the systems.
  • Improving the trustworthiness of the RI products through clearly-defined and standardized ways to present workflows.

The adoption of a systematized common framework, from the design to the implementation, shared among a number of ESFRIs and other relevant RI projects will in the long term provide further additional benefits including but not limited to:

  • Greater level of interoperability between RI through the use of common standards, which makes data usage and communication between the RI commonplaces.
  • Support of cross-disciplinary perspectives and products and enablement of systems science approach.
  • Larger potential user base due to the easier use of the RI products, which increases the impact and return of investment for each particular RI.

2.1    ODP approach

The ENVRI-RM is built using the Open Distributed Processing (ODP) framework, an international standard for distributed system specification published by ISO/IEC (ISO/IEC 10746-1, 1998). ODP provides an overall conceptual framework for specifying large or complex computing systems. It adopts the object modelling approach, and defines five specific viewpoints – abstractions that yield specifications of the whole system related to particular sets of concerns. The five viewpoints are:

  • The   Enterprise Viewpoint , which concerns the organisational situation in which business (research activity in the current case) is to take place. For better communication with the environmental science community, we refer to this in the ENVRI-RM as the Science Viewpoint .
  • The   Information Viewpoint , which concerns modelling of the shared information manipulated within the system of interest.
  • The   Computational Viewpoint , which concerns the design of the analytical, modelling and simulation processes and applications provided by the system.
  • The   Engineering Viewpoint , which tackles the problems of diversity in infrastructure provision; it gives the prescriptions for supporting the necessary abstract computational interactions in a range of different concrete situations.
  • The   Technology Viewpoint , which concerns real-world constraints (such as restrictions on the facilities and technologies available to implement the system) applied to the existing computing platforms on which the computational processes must execute.

The reasons for adopting ODP in ENVRI include:

  • It provides a descriptive framework for specifying and building large or complex system that consist of a set of guiding concepts and terminology. This provides a way of thinking about architectural issues in terms of fundamental patterns or organising principles;
  • It enables large collaborative design activities. ODP breaks down a complex design specification into separated but interlined viewpoints, which allows designers in different teams from different organisations to work in parallel and to deliver uniform specifications;
  • Being an international standard, ODP offers authority and stability.

2.2    The Reference Model

The development of the reference model is based on a preliminary study of a collection of the representative environmental research infrastructures (Chen 2013). By examining their computational characteristics, 5 common subsystems [1] have been identified: Data Acquisition ,   Data Curation ,   Data Access ,   Data Processing   and   Community Support .   The   fundamental   reason of the   division of the 5 subsystems   is based on the observation that all applications, services and software tools are designed and implemented around 5 major physical resources: the sensor network, the storage, the (internet) communication network, application servers and client   devices. The definitions of the five subsystems are given below:

  • Data acquisition: collects raw data from sensor arrays, various instruments, or human observers, and brings the measurements (data streams) into the system.
  • Data curation : facilitates quality control and preservation of scientific data. It is typically operated at a data centre.
  • Data access: enables discovery and retrieval of data housed in data resources managed by a data curation subsystem
  • Data processing: aggregates the data from various resources and provides computational capabilities and capacities for conducting data analysis and scientific experiments.
  • Community support: manages, controls and tracks users' activities and supports users to conduct their roles in communities.

The relationships between subsystems are depicted in Figure 2.1.

Figure 2.1: Illustration of the major points-of-reference between different subsystems

Amongst the five subsystems can be identified seven major points-of-reference wherein interfaces between subsystems can be implemented. These points-of-reference are as follows:

1)      Acquisition/Curation   by which the collection of raw data is managed.

2)      Curation/Access   by which the retrieval of curated data products is arranged.

3)      Acquisition/Access   by which the collection of raw data and the status of the observation network can be accessed and monitored externally.

4)      Curation/Processing   by which analyses of curated data is coordinated.

5)      Acquisition/Processing   by which acquisition events are listened for and responded to.

6)      Processing/Access   by which data processes are scheduled and reported.

7)      Community /All   by which the outside world interacts with the infrastructure in many different roles.       

Depending on the distribution of resources in an implemented infrastructure, some of these reference points may not be present in the infrastructure. They take particular importance however when considering scenarios where a research infrastructure delegates subsystems to other client infrastructures. For example, EPOS and LifeWatch both delegate data acquisition and some data curation activities to client national or domain-specific infrastructures, but provide data processing services over the data held by those client infrastructures. Thus reference points 4 and 5 become of significant importance to the construction of those projects .

Analysis of the common requirements of the six ESFRI environmental infrastructures has resulted in the identification of a number of common functionalities. As shown in Figure 2.2, these functionalities can be partitioned amongst the five subsystems. They encompass a range of concerns, from the fundamental ( e.g.   data collection and storage, data discovery and access and data security) to more specific challenges ( e.g.   data versioning, instrument monitoring and interactive visualisation).

In order to better manage the range of requirements, and in order to ensure rapid publication of incremental refinements to the ENVRI-RM, a   minimal model   has been identified which describes the fundamental functionality necessary to describe a functional environmental research infrastructure. The minimal model focuses on the major interaction links from raw data acquisition to the access and export of specific curated datasets, passing through stages of curation, brokering and authorisation. This core interaction chain represents the most fundamental contract between the archetypical research infrastructure and its community -- the access to scientific observations/measurements. The core interactions between data curation and data processing, as well as uploading of contributions from outside the infrastructure are also present in the minimal model , providing the skeleton by which additional extensions to the reference model can be attached, including alternative mechanisms for data retrieval and presentation. By initially focusing on this minimal model , it then becomes practical to produce a partial specification of the ENVRI-RM which nonetheless reflects the final shape of the ENVRI-RM without the need for significant refactoring.   Further development of the ENVRI-RM will focus on designated priority areas based on feedback from the contributing ESFRI representatives.

The ENVRI-RM subsystems are specified using the ODP standard framework. The ENVRI-RM defines an ‘archetypical’ environmental research infrastructure rather than a specific (implemented) infrastructure. Three viewpoints take particular priority: the Science , Information and Computational Viewpoints, which gives better focus on the core objective of ENVRI: to develop an understanding of the common requirements and to provide the design solutions for common data and operation services.

 

Figure 2.2: Radial depiction of ENVRI-RM requirements with the minimal model highlighted [2]

 

2.2.1 Science Viewpoint

The Science Viewpoint of the ENVRI-RM intends to capture the requirements for an environmental research infrastructure from the perspective of the people who perform their tasks and achieve their goals as mediated by the infrastructure. The key concepts defined in this viewpoint include communities and their roles and behaviours . 5 common communities are specified in according to the 5 subsystems:   data acquisition ,   data curation ,   data publication ,   data service provision , and   data usage . The definition of the communities are based on community objectives:

  • Data Acquisition , who collect raw data and bring (streams of) measurements into an infrastructure;  
  • Data Curation , who curate the scientific data, maintain and archive them, and produce various data products with metadata;
  • Data Publication , who   assist data publication, discovery and access;
  • Data Service Provision , who provide various services, applications and software/tools to link and recombine data and information in order to derive knowledge;
  • Data Usage , who make use of data and service products, and transfer knowledge into understanding.  

By analysing common requirements, use scenarios for each community are derived, community roles and behaviours are identified [3] .

2.2.2 Information Viewpoint

The Information Viewpoint provides a common abstract model for the shared information handled by the infrastructure. The focus lies on the information itself, without considering any platform-specific or implementation details. It is independent from the computational interfaces and functions that manipulate the information or the nature of technology used to store it. It specifies the types   of the information objects and the relationships between those types and how the states of these objects change as results of computational operations.

Modelling in this viewpoint in the ENVRI context employs a data-oriented approach which follows the lifecycle of scientific data (from raw to published and processed data) as information objects in each subsystem identifying their behaviour changes when events or action occur. The model captures common issues challenging many environmental research infrastructures such as, data enrichment including attribution of unique identifiers necessary for unambiguous identification and tracking of data provenance, association of metadata, semantic annotation, quality assessment, semantic mapping, and data discovery. The model has been continuously refined by examining the feasibility of implementations and applying community feedback.

The resulting model consists of a set of information objects managed and processed by the common subsystems, a set of action types which are events that cause the states changes of the information objects, and a set of constraints on these objects. The model also defines the dynamic schemata and the static schemata . The dynamic schemata captures how the information object evolve as the system operates, specifying the allowable state changes as the effects of the actions. On the other hand, the static schemata defines instantaneous views of the information objects at a certain stage of the data lifecycle defining a minimum set of constraints for data sharing.

2.2.3 Computational Viewpoint

The Computational Viewpoint of the ENVRI-RM accounts for the major computational objects expected within an environmental research infrastructure and the interfaces by which they interact. Each object encapsulates functionality implemented by a service or resource within the infrastructure (this encapsulation occurs at the conceptual level rather than the implementation level; it is admissible for the functions of a given object to be distributed across multiple computational resources in an implemented infrastructure, should that suit the infrastructure’s physical architecture). Each object provides a number of interfaces by which functions can be invoked on the object, or by which the object can invoke the functions of other objects. By linking client and server interfaces, a network of interactions between objects can be built that demonstrates the computational dependencies of different parts of an infrastructure. These bindings can then be further specified in order to determine the particular operations and information streams supported by the interaction between interfaces, as well as the information objects that must be present.

Figure 2.3: An example of interaction between interfaces of computational objects

For example a (simplified) brokered upload interaction might take the form illustrated in Figure 2.3. Four computational objects are identified: the science gateway encapsulating user-afforded functionality; the data import service handling import requests into the infrastructure; an internal data store controller managing access to a particular data store in the infrastructure; and an external data source controller from which data is to be extracted. In this instance, the data import service manages access via its import data and coordinate data import operational interfaces, responding to a request from the science gateway and invoking a selected data store respectively; this exchange of requests between objects can be further specified using, for example, a suitable UML sequence diagram. Once the data transfer has been validated and configured, the data can be pulled from the data source to the data store via compatible stream interfaces.

Each of the five essential subsystems of the ENVRI-RM must provide a number of computational objects of the kind illustrated above to be distributed across an infrastructure's technical architecture. For each of those objects, suitable interfaces must be identified and the most important interactions between those interfaces described. In the ENVRI-RM, the interfaces between subsystems are given particular attention, as many critical functions intercede in the movement of data between subsystems .

Figure 2.4 illustrates the computational objects involved in basic data acquisition, curation and access, positioned with respect to four of the five research infrastructure subsystems . Client/server interface labels have been merged for clarity. Multi-party interactions are coordinated via binding objects [4] (such as raw data collection and brokered data export ) that serve to simplify such interactions by abstracting aside implementation-specific details of the coordination such as how information and control is passed between objects when three or more parties are involved.

 

Figure 2.4: A subset of the core interactions involved in the acquisition and access of data

Thus the archetypical research infrastructure is considered here as having a brokered, service-oriented architecture. Core functionality is encapsulated in a number of service objects that control various resources present in the infrastructure. Access to most of these services by external entities is overseen by various brokers that validate requests and provide an interoperability layer between heterogeneous components --- this is particularly important for federated infrastructures, which may not be able to enforce a core set of standards on all data and services being integrated.

 

Below analysis only includes the aspects of Science Viewpoint, which is an update of the understanding of the requirements of the ICOS RI, resulting from the Amsterdam workshop discussion. (The updates of the analysis of Information Viewpoint  will be included in a later version). The purpose of the analysis is to identify ICOS community objectivise, roles, and behaviours, describing the tasks/operation process of each roles and their interactions.

ICOS Research Infrastructure (ICOS RI) is built to provide the long-term observations required to understand the present state and predict future behaviours of climate, the global carbon cycle and greenhouse gases emissions.

ICOS RI Objectivises are

  • To track carbon fluxes in Europe and adjacent regions by monitoring the ecosystems, the atmosphere and the oceans through integrated networks.
  • To provide the long-term observations required to understand the present state and predict future behaviour of the global carbon cycle and greenhouse gas emissions.
  • To monitor and assess the effectiveness of carbon sequestration and/or greenhouse gases emission reduction activities on global atmospheric composition levels, including attribution of sources and sinks by region and sector.

Figure 3.1 shows the annotations of ICOS RI organisational structure using the Reference Model (RM) terminologies. From the analysis, the community roles and behaviours can be identified, and workflow can be understand, which are described in the following subsections.

 

 

 

Figure 3.1 : Annotation of ICOS Organisational Structure Using Terminology of the Reference Model Science Viewpoint

3.1    ICOS RI Roles

Table 3.1 provides the roles identified in ICOS Research Infrastructure, the descriptions of them, and the role names defined by the Reference Model. 

Table 3.1 : Roles in ICOS RI and Role Names in the Reference Model

Roles Instances in ICOS RI

Descriptions

RM SV_Roles Names

ICOS General Assembly

 

  • Police or Decision Maker

Scientific Advisory Board

(SAB)

 

  • Police or Decision Maker

ICOS RI Committee

It is an advisory body to the Director of ICOS ERIC, and decides about strategies concerning the Carbon Portal [5] .

  • Police or Decision Maker

Director General

 

  • Police or Decision Maker

Head Office/Headquarter

(HO)

The ICOS RI Head Office will have three main task groups, which are 4 :

  1. Managing the ICOS ERIC legal entity
  2. Strategic scientific and technical planning, coordination and integration.
  3. Community building, outreach, promotion and training
  • Police or Decision Maker

Carbon Portal (CP)

The Carbon Portal shall provide a "one-stop shop" for ICOS data products. It is envisioned as a place where all data produced within ICOS station network can be discovered and accessed and where the scientific community can post elaborated data products that are obtained from ICOS data [6] .  

  • Data Curation Subsystem
  • Storage Administrator
  • Storage
  • Data Access Subsystem
  • Service Provider

Potentially, CP may also be:

  • Service Registry
  • PID Generator
  • PID Registry
  • Semantic Mediator

Connect projects and International network

Provide data to ICOS RI

  • Data Originator
  • Global networks GEOSS
  • Greenhouse gas flux assessment International programs

Consume the data provided by ICOS RI

  • Data   Consumer

The Central Analytical Laboratory (CAL)

CAL ensures the accuracy of observational data, thorough quality control and routine testing of air sampling material. It provides reference gases for calibration of in-situ measurements performed at the continuous monitoring stations. It also analyses air samples collected at the monitoring stations. CAL is hosted by Germany 4 .

  • Environmental Scientist
  • (Measurement Model) Designer
  • Data collector
  • Data Acquisition Subsystem
  • Data Curator
  • Data Curation Subsystem
  • Storage Administrator
  • Storage
  • Data Originator

The Atmospheric Thematic Centre (ATC)

ATC   is responsible for continuous and discontinuous air sampling, instrument development/servicing, data processing and storage. A central place is needed to ensure that all data are treated with the same algorithms and properly archived for the long term, that the ICOS atmospheric stations can receive permanent support for optimal operation during their lifetime, and that new sensors can be smoothly implemented in the network in the future. ATC is coordinated and hosted by France, with Nordic Hub and Mobile Lab hosted by Finland [7] .

  • Data Curator
  • Data Curation Subsystem
  • Storage Administrator
  • Storage
  • Data Originator

The Ecosystem Thematic Centre   (ETC)

ETC coordinates the ICOS Ecosystem Network providing assistance with instruments and methods, testing and developing new measurement techniques and associated processing algorithms. It also ensures a high level of data standardization, uncertainty analysis and database services in coordination with the ICOS Carbon Portal. ETC is coordinated and hosted by Italy, together with Belgium and France 4 .

The Ocean Thematic Centre   (OTC)  

OTC will be coordinating measuring the carbon cycle in oceans within ICOS. It will provide support to the ICOS marine network in the form of information and technical backup on the state of the art instrumentation and analytical methods. It will provide of data storage and processing techniques, quality control, and network-wide integration of data to into useful products, such as maps of CO 2   fluxes, carbon transport, and the assessment of ocean acidification 4 .

Monitoring Station Assemblies (MSA)

MSAs discuss technical and scientific matters, and services concerning their component to further develop and improve ICOS and its networks. MSAs work together with ATC, ETC and OTC, but have also independent role [8] .

 

MSA Members are scientific and technical experts from the monitoring stations of Member countries that constitute the basis of ICOS ERIC; All Atmospheric station PIs, Ecosystem station PIs and Ocean station PIs are the members of the respective MSAs.

  • Environmental Scientist
  • (Measurement Model) Designer

Station Principal Investigators (SPI)

 

  • Data Curator

Atmospheric Stations

They are established to measure continuously the greenhouse gas (CO2, CH4, N2O) concentration variability due to regional and global fluxes [9] .  

  • Sensor
  • Sensor network
  • Technician
  • Measurer
  • Data collector
  • Data Acquisition Subsystem

Ecosystem Stations

They are built for monitoring the functioning of land ecosystems and the exchange of energy and greenhouse gases between the ecosystems and the atmosphere 3 .

Ocean Ships and Stations

Marine ICOS will provide the long-term oceanic observations required to understand the present state and predict future behaviour of the global carbon cycle and climate-relevant gas emissions 3 .

Users of ICOS data products:

Researchers;

International and national Operational Centres assimilating atmospheric composition data;

Policymakers and stakeholders involved in negotiating carbon reduction policies;

Carbon trading communities;

Regional authorities and carbon inventory agencies;

Private land owners and industrial contributors of greenhouse gas emissions;

The general public interested in greenhouse gas emissions and global climate change.

Commercial users

Others

 

  • Scientist or Researcher
  • Police or Decision Maker
  • Private Sector (Industry investor or consultant)
  • General Public, Media or Citizen (Scientist)

3.2    ICOS RI Communities Behaviours

Table 3.2 provides mapping of  ICOS roles to the ENVRI 5-common-community . Analysing the role key responsibilities results in the mapping of the community behaviours defined in the Reference Model.    

Table 3.2 : Mapping of ICOS RI roles into the ENVRI Common Communities and Identifying the Community Behaviours

 

Roles Instances in ICOS RI

Key Responsibilities

RM SV_Community Behaviours

Data Acquisition Community

  • National Measurement Networks
    • Atmospheric Stations
    • Ecosystem Stations
    • Oceanic Ships and Stations
  • Perform measurements according top ICOS standards
  • Collect data and send to Thematic Centres
  • Can have non-ICOS functionality & responsibilities, e.g., they may also
    • Collect other types of data
    • Perform their own data analysis (*not* official ICOS!)
    • Operate their own web sites
  • Instrument Configuration
  • Instrument Calibration
  • Data Collection
  • Monitoring Station Assemblies (MSAs)

(See role descriptions)

  • Design of Measurement Model
  • Ecosystem Thematic Centre (ETC)

Performs measurements of soil samples

  • Design of Measurement Model
  • Data Collection
  • The Central Analytical Laboratory (CAL)

(See role descriptions)

  • Data Collection

Data Curation Community

  • Station Principal Investigators (SPIs)
  • Perform quality checks
    • In near real time (for some systems)
    • After (pre-) processing at Thematic Centres
    • Before “final” datasets are “published”
  • Data Quality Checking
  • Central Facilities
    • Ecosystem Thematic Centre
    • Atmospheric Thematic Centre
    • Ocean Thematic Centre
  • Compose and maintain procedures and protocols for measurements
  • Create “publishable” data sets
  • Keep own competence up to date
  • Maintain their own websites
    • Info on measurements
    • Near Real-Time data visualization
  • Data processing info (for SPIs, mainly)
  • Serve as experts
    • For stations within ICOS RI network
    • For external partners (if resources allow)
  • Data Preservation
  • Data Product Generation
  • Data Replication
  • The Central Analytical Laboratory

(See role descriptions)

  • (Instrument) Calibration
  • Data Quality Checking
  • Connect projects and International network

(See role descriptions)

 

  • ICOS Carbon Portal

Organize and ensure back-up storage and long-term archiving of published ICOS data sets

  • Data Replication
  • Data Preservation

Data Publication Community

  • ICOS Carbon Portal
  • Generate and provide effective tools to publish , discover , access and retrieve ICOS observations data according to user needs
  • Offer user-friendly, web-based access to products elaborated from ICOS data
  • Establish interfaces with other relevant data portals
  • Ensure basic semantic interoperability by maintaining a full copy of the standard metadata and data description documents (ontologies) held at the ICOS TCs, including the compilation of the vocabularies in use within ICOS
  • Coordinate regular publication of the ensemble of the ICOS data, with the TCs and the ICOS community of PIs
  • Organize the traceability of downloaded ICOS data, including the application of persistent unique identifiers for citation purposes
  • Record relevant bibliometric information and establish indicators about the use of ICOS data
  • Data Publication
  • Data Discovery & Access
  • Semantic Harmonisation
  • Data Citation

Data Service Provision Community

  • Central Facilities
    • Ecosystem Thematic Centre
    • Atmospheric Thematic Centre
    • Ocean Thematic Centre
    • Analytical Laboratory
  • Process data (and analyze some samples)

 

  • ICOS Carbon Portal
  • Define and implement advanced web services and procedures for web-based data visualization , retrieval and processing
  • Encourage, coordinate , facilitate and ensure the operational provision of elaborated products and synthesis efforts based on ICOS data
  • Service Description
  • Service Coordination
  • Service Composition

Data Usage Community

  • ERIC Head Office

Organise general ICOS outreach actions on the basis of the scientific material (advanced data plots and visuals) provided by the Carbon Portal [10] .

 

  • Director General
  • ICOS RI Committee
  • ICOS Council
  • Scientific Advisory Board (SAB)
  • General Assembly

(See role descriptions)

 

  • Global networks GEOSS
  • Greenhouse gas flux assessment International programs

(See role descriptions)

 

  • Users of ICOS data products

(See role descriptions)

 

  • ICOS Carbon Portal

Implement a common user registration authentication system for ICOS that allows usage tracking

  • User Profile Management
  • User Behaviour Tracking

 

Note

By ODP/RM definition, a computational system could play a passive role in a community. For example, ICOS Carbon Portal is regarded as a role in the communities of:  Data Curation, Data Publication, Service Provision and Usage.

 

Note

The 5-common-subsystem and their objectives as defined by the Reference Model are depicted in Figure 3.2.

 

Figure 3.2: ENVRI 5-common-subsystem and their objectives

        Definitions

  • Data Acquisition Community , who collect raw data and bring (streams of) measurements   into a system;
  • Data Curation   Community , who curate the scientific data, maintain and archive them, and produce various data products with metadata;
  • Data Publication   Community , who   assist data publication, discovery and access;
  • Data Service Provision   Community , who provide various services, applications and software/tools to link and recombine data and information in order to derive knowledge;
  • Data Usage   Community , who make use of data and service products, and transfer knowledge into understanding.  

3.3    ICOS RI Workflows

We have explicitly defined the ICOS community roles and clearly described their behaviours. In this subsection, we examine how those roles interact with each other (through ICOS RI) and collaboratively fulfil the community objectives. We will output the result of the analysis in a workflow diagram which depicts the activities and processes conducted by roles, and the directions of controls and objects flows from one role to another.  

 

Figure 3.3 are the existing design diagrams obtained from ICOS team. (a) gives an overview of computation and data-flow in the ICOS RI. (b) provides the details of ICOS data life-cycle, and (c) describes the DOIs assigning process. From these information, we conclude the key community processes and workflow in Figure 3.4.

(a)   An overview of the proposed data-flow in ICOS (April 2014)

 

     

Lifecycle of the NRT Data                                         Lifecycle of the L2 Data

Lifecycle of the L3 Data

(b)   ICOS Data Lifecycle

(c) ICOS PIDs (DOIs) Assigning Process. DOIs by the Reference Model definition is one type of Persistent Identifiers (PIDs). (c) shows a DOI system will be established within the Carbon Portal to assign DOIs to the L0 data generated at the Stations, L1, L2 data produced at the Thematic Centres, and L3 data processed at the Carbon Portal.

Figure 3.3: Analysis of ICOS Requirements

Figure 3.4: ICOS RI Community Process. Each column corresponds to one ICOS RI community role. A black dot represents the starting point of the workflow, and a black dot with a circle represents an ending point of the workflow. Each box in a role column represents a process performed by that role. An arrow indicates the direction of the (control/object) flow between processes. 

Figure 3.4 describes the workflow and the key community process from data collection to data access. The workflow starts from the process that each Station 1) “ collects the L0 data ”, and 2) “ stores the L0 data ”. At this point, each station may request the Carbon Portal to 3) “ generate PIDs (DOIs) for L0 data ”. With available PIDs, each station will 4) “ add PIDs (DOIs) to L0 data ”, also 5) “ add and store metadata for L0 data ”. Then, station Principle Investigators (PIs) will 6) “ check quality of L0 data ”. Thereafter, L0 data will be delivered to Thematic Centres. Each Thematic Centre will 6) “ store L0 data and metadata ”, and 7) “ archive L0 data and L0 metadata ”. Each Thematic Centre also 8) “ enables the visualisation of (the L0) data ”, to allow 9) end users to “ view (the L0) data from Thematic Centres websites ”. After 6), Thematic Centres also 10) “ pre-process L0 data to generate L1, L2 data ”, 11) “ store L1, L2 data ”, 12) request the Carbon Portal to “ generate PIDs (DOIs) for L1, L2 data ”, 13) ” add PIDs (DOIs) to L1, L2 data ”, and 14) “ add and store metadata for L1, L2 data ”. At this point, station PIs may need to 15) “ check quality of L1, L2 data ”. After that, Thematic Centres will 9) “enable the visualisation of (the L1) data ” and 10) allow end users to “ view (the L1) data from Thematic Centres websites ”. Meantime, a copy of dataset will be sent to the Carbon Portal. The Carbon Portal will 16)“ archive L1, L2 data and L1, L2 metadata ”, 17) “ store L2 data ”, 18) “ enable search & discovery of L2 metadata ”, 19) “ enable download and visualisation of L2 data ”. This will enable end user to 20) “ view the L2 data from www.icos-carbon-portal.eu ”. The Carbon Portal will also 21) “track statistics” of any usage of the data. With stored L2 data, after 17), the Carbon Portal also 22) “ processes L2 data to generate L3 Data ”, 23) “ stores L3 data ”, 24) “ generates PIDs (DOIs) for L3 data ”, 25) “ adds PIDs (DOIs) to L3 data ”,  26) “ adds and stores metadata to L3 data ”, and 27) “ archive L3 data and L3 metadata ”. Meantime, the Carbon Portal also 18) “ enables search & discovery of (the L3) metadata ”, 19) “ enables download and visualisation of (the L3) data ”, which will enable end user to 20) “ view (the L3) data from www.icos-carbon-portal.eu ”. Again, any usage of the ICOS data will be 21) “ tracked ” by the Carbon Portal.

 

Above descriptions illustrate the overall operational processing workflow within ICOS RI. However, when examining the operation details of each thematic centre and the Central Analytical Laboratory, the workflow process are varied. Figure 3.5-3.7 show the workflow of ETC, ATC and CAL. 

 

ETC Workflow

Figure 3.5: ETC workflow

Figure 3.5 illustrates: ecosystem sites acquire high temporal resolution continuous measurements with a large number of sensors. These data are processed by the central facility in order to create high quality datasets that are then distributed to the users community. The data processing includes a number of steps that transform the original measurement by a single sensor to ecologically or micro-meteorologically meaningful quantities, often at a different time resolution, with different units and quality (filtering criteria and gap filling) derived also by process based modeling activities.

 

The figure uses the following definitions:

 

Data : is a set of magnitudes or values of a given physically or abstractly meaningful quantity measured by a sensor or derived by processing – e.g., air temperature or quality level

 

Variable : label assigned to a quantity and used to identify data within a data representation structure such as a file – e.g., TA, TA_GF, TA_QC

 

The main aspects that characterize a variable and for this reason should be easy to identify are:

1)      Spatial representativeness: what the data associated to a variable are representing in terms of area around the sensor;

2)      Directly measured, filtered or derived: if the data are directly measured, if it has been filtered according to quality flags, is it has been gapfilled or derived from measured data by execution of models;

3)      Single sensor or combination of sensors:  if the data are obtained from a single sensor or a combination of sensors of either the same type or different types;

4)      Type of processing and corrections applied to data: if the processing applied to the data includes some ecosystem process understanding that could violate the independency when measurements are used in the context of model validation and parameterization.

 

To support the users in understanding the characteristics of variables and data, the variable code prefix-suffix, the metadata associated to the variable and the level definition are used. In particular, metadata includes the information for all the four aspects, variable codes are linked to aspects 2 and 3 while the level definitions provide information on aspects 1,2 and 4. In this document the levels are described.

 

Different data levels are defined across two scales: spatial representativeness (using numeric levels), and processing and quality information (using alphabetic levels).

The spatial representativeness is defined on the basis of the eddy covariance fluxes measurements. Fluxes are representative of an area (footprint) around the measurements point with a radius of the order of tens to hundreds of meters. The levels define if a specific variable can be considered representative of the footprint or not.

1 – Data representative of individual sensor surroundings; the dataset should not be considered as representative of the footprint or ecosystem. Data at this level can be aggregated at different temporal resolutions

2 – Data representative of tower footprint either measured by one or more sensors. Data at this level can be aggregated at different temporal resolutions

3 – Spatially gridded data (not discussed in this document)

 

The processing and quality indicator helps to identify if the data have been quality filtered and also if gap-filling and other processing have been applied. In addition helps to identify if a specific products if based on models that has some process-level knowledge.

A – Actual measurements, plus quality flags when available (but data not filtered)

B – Quality flags applied (data filtered) and gapfilled with empirical methods (without process level knowledge)

C – Data that includes results from models based on process-level knowledge (including gapfilling)

 

Level 0A – Original measured raw data and associated quality flags

Level 0B – Quality filtered raw data (derived from Level 0A)

Level 0C – [UNUSED]

 

Level 1A – Sensor-level data, derived from Level 0B data and aggregated at different relevant time resolution. Quality flags associated when available.

Level 1B – Quality filtered sensor-level data using flags values available in the Level 1A data and gap-filled using only fully empirical approaches

Level 1C – Quality filtered sensor-level data using flags values available in the Level 1A data and gap-filled using process based approaches, additional sensor level data products derived from model

 

Level 2A – Footprint-level data, derived either from Level 0B and/or Level 1 data and aggregated at different relevant time resolution. Quality flags associated when available

Level 2B – Quality filtered footprint-level data using flags values available in the Level 2A data and gap-filled using only fully empirical approaches

Level 2C – Quality filtered footprint-level data using flags values available in the Level 2A data and gap-filled using process based approaches, additional footprint level data products derived from model

 

ATC Workflow

Figure 3.6: ATC workflow

CAL Workflow

Figure 3.7: CAL workflow

4        Conclusion

This report provide updates of the analysis of the ICOS RI using the ENVRI Reference Model (Science Viewpoint) concepts and principles. The ICOS community roles, behaviours and workflow processes are identified and mapped onto the Reference Model common framework.

 

Many work remains:

  • The descriptions and a detailed analysis of the workflows of ETC, ATC and CAL shall be provided. Some concepts such as data levels, NRT data, data publication policies shall be clarified and synchronised among different TCs and CAL. Common operations shall be identified  to avoid duplication efforts. Solutions of using existing advanced (e-Science) technology, e.g., EUDAT, as the underneath e-Infrastructure onto which deploy the Carbon Portal services, shall/can be provided.

 

  • When the requirements analysis become satisfactory to the ICOS RI community, analysis of Information Viewpoint shall be provided, where data objects and actions shall be specified, data lifecycle shall be explained, and state changes of the data objects as the results of the operations shall be defined.

 

  • In addition to the Science and Information Viewpoints, towards the final design of the ICOS RI, the specification of Computational Viewpoint will also be useful, which will provide the detailed design of the service components, and their interfaces and interactions.   

 

As the final remark, the Reference Model will continue to assist the design and construction of ICOS Research Infrastructure and its Carbon Portal. On the other hand, ICOS use case provides the Reference Model a useful instance for evaluation and helps identifying new requirements. Advanced design and implementation experience of ICOS will be captured in the future development of the Reference Model, and to be shared with other environmental science research infrastructures.

 

5        References

Chen, Y. (et al) (2013): Analysis of Common Requirements for Environmental Science Research Infrastructures, ISGC 2013.

Linington, P. (et al) (2011): Building Enterprise Systems with ODP: An Introduction to Open Distributed Processing, Chapman & Hall/CRC Press.

ISO/IEC 10746-1 (1998): Information technology—Open Distributed Processing – Reference Model: Overview, ISO/IEC standard.

 

 

 

 

 

 

 

 

 

 

 


[1] Here, we define subsystem as a set of capabilities that collectively are defined by a set of interfaces with corresponding operations that can be invoked by other subsystems. An interface in ODP is an abstraction of the behaviour of an object that consists of a subset of the interactions of that object together with a set of constraints on when they may occur (Linington 2012).

[2] The definitions of the functionalities are given at the reference model wiki site: http://miniurl.com/92M z .

[3] Due to space limitation, the definitions of these concepts can be found at www.envri.eu/rm .  

[4] A binding object is an ODP computational object, which supports a binding between a set of other computational objects (Linington 2012).