1. Background

1.1 Short description

Data Acquisition services and in particular the preparation of data transfer (ENVRI RM: prepare data transfer) prior to data transmission are not yet sufficiently standardized. This hinders efficient, multi RI (Research Infrastructure) data processing routines such as Data Quality checking . This use case intends to promote standardization and move the standardization level close to the sensor. Objectives include:

1.2 Contact

BackgroundContact PersonOrganizationContact email
RI-ICT RI-ICT RI-ICTRobert Huber Andree Behnken Markus StockerUniHB, PANGAEA, EMSO UniHB, PANGAEA, FixO3 UniHB,PANGAEArhuber@uni-bremen.deabehnken@marum.demstocker@marum.de
RI-ICTThierry CarvalIfremer, Euro-Argothierry.Carval@ifremer.fr
RI-ICTOlivier GilbertEPOSolivier.gilbert@univ-grenoble-alpes.fr
RI-ICTMauro MazzolaSIOSm.mazzola@isac.cnr.it
RI-ICTFederico CarenutoANAEEcarotenuto@ibimet.cnr.it
RI-ICTAlessandro ZaldeiANAEEa.zaldei@ibimet.cnr.it
RI-ICTAlessandro MateseANAEEa.matese@ibimet.cnr.it
RI-ICTSimona ScolloEPOSsimone.scollo@ingv.it
RI-ICTJean-Francois RolinEMSOjean.francois.rolin@ifremer.fr
RI-ICTMickael LanglaisEPOSmickael.langlais@univ-grenoble-alpes.fr
RI-ICTFadi ObeidLab STICCfadi.obeid@ensta-bretagne.org
RI-ICTAngelo ViolaSIOSangelo.viola@artov.isac.cnr.it
RI-ICTThomas LoubrieuSeaDataNetThomas.Loubrieu@ifremer.fr
RI-ICTJean-Francois RolinIFREMERJean.Francois.Rolin@ifremer.fr

1.3 Use case type

The use case will be an implementation case.

1.4 Scientific domain and communities

Scientific domain

All

Communities

Data Acquisition, Data Service Provision

Behaviours

Relevant community behaviours: Instrument Configuration, Data Collection, Data Quality Checking, Semantic Harmonization

Relevant community roles: Sensor, Sensor Network, Measurement Model Designer, Data Acquisition Subsystem, Data Curator, Semantic Curator

2. Detailed description

Objective and Impact

The use case will move the standardization level close to the sensors of RIs, thus allow the implementation of common, generic data processing routines such as NRT QC .

‘Data Transmission’ at the sensor as well as platform level (Fig.1) largely depends on community specific needs and habits or simply on manufacturer specifications. Both result in proprietary or niche formats and protocols that require data to subsequently be processed by data transformation services before they can be delivered in a standardized format (Fig. 1). The ENVRIPLUS objective in WP1 is to promote sensor web enablement strategies for the various context of RIs it is then of high importance to connect the choices in sensor interface and the Quality control procedures.

The objective of this use case is to contribute to the harmonization of data transmission formats and protocols.

The use case will test two data transmission formats and protocols, namely the OGC Sensor Web Enablement (SWE) Sensor Observation Service (SOS) and the Semantic Sensor Network (SSN) ontology in combination with RDF Streams[2]. Appropriate data compression (EXI) will be tested to ensure resource friendly data transmission.

Furthermore, it will implement generic quality control procedures such as those defined within WP 3.3, and make use of the standardized data transmission formats to perform generic, cross-RI NRT QC routines, tag the controlled data with appropriate data quality flags, again using the standard formats mentioned above.

Challenges

Detailed scenarios

The use case will test two scenarios. One will be based on the Sensor Web Enablement (SWE) suite of standards (e.g. SOS) while the other will use the Semantic Sensor Network (SSN) ontology.

The Sensor Web Enablement (SWE) approach

The idea is to use transactional SOS requests to transmit data. The sensor/platform will transmit data via transactional SOS InsertObservation commands. An (optional) message broker will forward these commands to the service endpoint at e.g. EGI and at the same time send the data to the RI data processing center. At EGI and/or the RI data staging endpoint NRT QC routines will take place.

Tasks:

The Semantic Sensor Network (SSN) ontology approach

Required services:

We have described an ENVRIplus Implementation Case that aims at embedding standards for the encoding and format of observation data into sensing devices. Of specific focus are standards by the OGC - in particular SensorML, Observation & Measurements, and Sensor Observation Service - and recommendations by the W3C - specifically the Semantic Sensor Network ontology. Embedding such standards into sensing devices enables the acquisition of observation data from sensing devices natively encoded and formatted following these standards. This will reduce the number of translations required during data acquisition. Given standardized streams of observation data, the Implementation Case investigates the execution of generic data processing routines on data streams. Of interest are routines for near real-time quality control (NRT QC).


We think that Apache Storm [1] could support the implementation of the proposed case. Apache Storm is a distributed real-time computation system. It specializes on reliable processing of data streams and is designed to support real-time analytics and continuous computation, among other use cases. Central to Apache Storm is the notion of Storm topology. A topology consumes streams of data and processes streams in arbitrarily complex ways. It thus models the logic for a real-time application. A topology is a directed acyclic graph. Nodes are either spouts or bolts. Vertices are streams. A stream is an unbounded sequence of tuples. Tuples are data packages. A spout is a source of streams in a topology. Bolts perform computations (processing) on tuples.

We intend to investigate the application of Apache Storm to the described Implementation Case. Specifically, the idea is to model the data acquisition and NRT QC computations as a Storm topology. Sensing devices may be modelled as Storm spouts, i.e. as sources of streams. Streams model the transmission of sensor data in the topology. The data - here encoded following the OGC and/or W3C standards - are modelled as tuples. Finally, any computational node is modelled as a bolt of the topology. Of particular interest are computational nodes that execute a routine for NRT QC, such as outlier detection. Being modelled as bolts of a Storm topology, such outlier detection operates as a continuous computation task on the tuples of streams, as specified by the topology.

We also intend to partner with EGI which could serve as platform for the deployment of Storm topologies on a distributed computer network.

 

[1] http://storm.apache.org/

Tasks:

Near real-time quality control

Tasks:

Technical status and requirements

The use case involves EGI and requires, e.g., the use of a scalable data processing environment via virtual machines.

Implementation plan and timetable

1.Month 5: Implementation of SOS based data transmission

2.Month 10: Implementation of SSN based data transmission

3.Month 12: Generic data object model (e.g. based on SSN)

4.Month 15: VM for data brokering and data quality control routines

5.Month 18: Deployment of NRT quality control routines

Expected output and evaluation of output

A generic NRT QC service capable of accepting standardized SOS data or SSN RDF data streams will be ready.

[2] https://www.w3.org/community/rsp/