페이지 트리
메타 데이터의 끝으로 건너뛰기
메타 데이터의 시작으로 이동

이 페이지의 이전 버전을 보고 있습니다. 현재 버전 보기.

현재와 비교 페이지 이력 보기

« 이전 버전 4 다음 »

Introduction

Environmental science now relies on the acquisition of great quantities of data from a range of sources. That data might be consolidated into a few very large datasets, or dispersed across many smaller datasets; the data may be ingested in batch or accumulated over a prolonged period. To use this wealth of data effectively, it is important that the data is both optimally distributed across a research infrastructure's data stores, and carefully characterised to permit easy retrieval based on a range of parameters. It is also important that experiments conducted on the data can be easily compartmentalised so that individual processing tasks can be parallelised and executed close to the data itself, so as to optimise use of resources and provide swift results for investigators.

We are concerned here with the gathering and scrutiny of requirements for optimisation. More pragmatically, we are concerned with how we might develop generically applicable methods by which to optimise the research output of environmental science research infrastructures, based on the needs and ambitions of the infrastructures surveyed in the early part of the ENVRI+ project.

Perhaps moreso than the other topics, optimisation requirements are driven by the specific requirements of those other topics, particularly processing, since the intention is to address specific technical challenges in need of refined solutions, albeit implemented in a way that can be generalised to more than one infrastructure. For each part of an infrastructure in need for improvement, we must consider:

  • What does it mean for this part to be optimal?
  • How is optimality measured—do relevant metrics already exist as standard?
  • How is optimality achieved—is it simply a matter of more resources, better machines, or is there need for a fundamental rethink of approach?
  • What can and cannot be sacrificed for the sake of 'optimality'? For example, it may be undesirable to sacrifice ease-of-use for a modest increase in the speed at which experiments can be executed.

More specifically, we want to focus on certain practical and broadly universal technical concerns:

  • What bottlenecks exist in the functionality of (for example) storage, access and delivery of data, data processing, and workflow management?
  • What are the current peak volumes for data access, storage and delivery for parts of the infrastructure?
  • What is the (computational) complexity of different data processing workflows?
  • What are the specific quality (of service, of experience) requirements for data handling, especially for real time data handling?

Optimisation gathering is coordinated by with help from go betweens.

Overview and summary of optimisation requirements

<The overview and summary should be written (integrated and distilled) by the topic leader(s), highlighting commonalities and reporting significant variations. It should be refined and agreed by the go-betweens who contributed to this topic. In particular, they should check that critical points have not been missed and that a balance has been attained.>

Research Infrastructures

The following RIs contributed to developing optimisation requirements

<Delete from the following list any that were not able to contribute on this topic>

<Add an interest inducing sentence or two, to persuade readers to look at the contribution by a particular RI. e.g., What aspect of the summary of requirements, or the special cases, came from this RI. Check with RIs that they feel they are correctly presented.>

ACTRIS: <e.g., This RI ... and therefore has XYZ <Topic> requirements, with a particular empahsis on ...>

AnaEE:

EISCAT-3D:

ELIXIR:

EMBRC:

EMSO:

EPOS:

Euro-ARGO:

EUROFLEETS2:

ESONET:

EUROGOOS:

FIXO3:

IAGOS:

ICOS:

INTERACT:

IS-ENES2:

JERICO:

LTER:

SEADATANET:

SIOS:

 

 

 

 

  • 레이블 없음