| 목차 |
|---|
| 페이지 속성 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
Ambition
The ocean experts are now converging in the estimation of integrated indicators such as global warming. However these indicators, based on interpolation of unevenly distributed observations, do not describe consistently the climate change. To better understand the ocean circulation and climate machinery, data scientists need to directly access the original observations otherwise diluted in spatial synthesis.
Original observations are published by Research Infrastructures (Argo, EMSO, ICOS…) and data aggregators (SeaDataNet, Copernicus Marine,…).
The Marine Competence Centre long term ambition is to push Ocean observations on EOSC infrastructure for data analytics.
User stories
| 정보 | ||
|---|---|---|
| ||
Requirements are based on a user story, which is is an informal, natural language description of one or more features of a software system. User stories are often written from the perspective of an end user or user of a system. Depending on the community, user stories may be written by various stakeholders including clients, users, managers or development team members. They facilitate sensemaking and communication, that is, they help software teams organize their understanding of the system and its context. Please do not confuse user story with system requirements. A user story is an informal description of a feature; a requirement is a formal description of need (See section later). User stories may follow one of several formats or templates. The most common would be: "As a <role>, I want <capability> so that <receive benefit>" "In order to <receive benefit> as a <role>, I want <goal/desire>" "As <persona>, I want <what?> so that <why?>" where a persona is a fictional stakeholder (e.g. user). A persona may include a name, picture; characteristics, behaviours, attitudes, and a goal which the product should help them achieve. Example: “As provider of the Climate gateway I want to empower researchers from academia to interact with datasets stored in the Climate Catalogue, and bring their own applications to analyse this data on remote cloud servers offered via EGI.” |
The Marine community produces diverse types of data (typically time-series data). They wish to store those data in files and make these files easily browsable and accessible by researchers. To maximise ease of use the files should be made available to users via a Dropbox-like system that makes relevant data files visible for each user in his/her ‘personal folder’. The users should be able to define patterns that define what kind of data they are interested in (location, time period, provider network, etc.) and the system should perform pattern matching to decide whether or not to make a particular incoming file (or set of files) visible for a given user. Such pattern matching can be CPU-intensive when we scale up to many users, many files files with complex data records. Depending on the community the source of data can be a single instrument (site), or can be multiple collection/production sites. In the latter case the data originating from multiple locations should be brought onto common formats and must be described with metadata in a coherent fashion.
The Marine CC is testing (See Figure below)
- a combination of B2Find, B2Safe and B2Stage for the data management part (storage and transfer)
- a Jupyter, B2Access, EGI Cloud combination for user exposure. (data subscription and access)
No. | User stories |
|---|---|
US1 | A data provider should be able to link its data production instruments into the 'back-end' of the Marine CC setup and become a data provider for the CC users. |
US2 | A scientists should be able to browse the connected data source networks (e.g. Argo, EMSO, SeaDataNet, etc.) and define preferences for the data records he/she is interested in. The system should make matching records visible in his/her personal access folder. |
US3 | A user should be able to access his/her personal data access folder via a Jupyter system and perform data analytics on the data. |
Use cases
| 정보 | ||
|---|---|---|
| ||
A use case is a list of actions or event steps typically defining the interactions between a role (known in the Unified Modeling Language as an actor) and a system to achieve a goal. Include in this section any diagrams that could facilitate the understanding of the use cases and their relationships. |
Step | Description of action | Dependency on 3rd party services (EOSC-hub or other) |
|---|---|---|
UC1 |
| |
UC2 |
| |
UC3 |
|
Architecture & EOSC-hub technologies considered/assessed
B2SAFE: synchronize every day Argo data from Ifremer to B2SAFE
B2DROP: as an input for data scientists individual datasets
B2ACCESS: the user (data scientist) identification service
JupyterHub: the data analytics platform on datasets (Example: DIVA analysis on a Jupyter Notebook reading Argo data)
Data subscription web GUI and API to query
- Cassandra: the nosql data base for high performance query on data
- Elasticsearch: the for high performance queries on metadata
Requirements for EOSC-hub
Technical Requirements
Requirement ID | EOSC-hub service | GAP (Yes/No) + description | Requirement description | Source Use Case |
|---|---|---|---|---|
Example | EOSC-hub AAI | Yes: EOSC-hub AAI doesn’t support the Marine IdP | EOSC-hub AAI should accept Marine IDs | UC1 |
RQ1 | <Gap service> | Yes: …. | ||
RQ2 | Cloud Compute | No | Create VMs ia a gateway | UC2 |
Capacity Requirements
EOSC-hub services | Amount of requested resources | Time period |
|---|---|---|
B2SAFE | 100go for Argo data | |
B2DROP | EOSC hub data scientist user default account on B2DROP | |
B2ACCESS | 100 users should be able to access the services | from 2020 |
JupyterHub | EOSC hub data scientist default account on Jupyter Hub | |
| Host the data subscription web GUI with its Cassandra and Elasticsearch databases |
Validation plan
Not yet defined.

