Introduction defining context and scope

System-level environmental science involves large quantities of data, often diverse and dispersed—there are many different kinds of environmental data commonly held in small datasets, and the velocity of data gathered from detectors and other instruments can be very large. Data-driven experiments require not only access to distributed data sources, but also parallelisation of computing tasks for the processing of data. The performance of these applications determines the productivity of scientific research and the optimisation of system-level performance is going to be urgently needed by the RI projects in ENVRIPLUS as they enter production.

This topic focuses on how to augment the common services needed for optimising performance of experiments conducted on research infrastructure, particularly on how data is delivered and processed by the underlying e-infrastructure. There must be consideration of the Service Level Agreements (SLAs) offered by e-infrastructure, and of the available mechanisms for controlling the system-level quality of service (QoS). This topic should therefore focus on the high-level optimisation mechanisms available for making decisions on resources, services, data sources and potential execution platforms, and on scheduling the execution of tasks. The semantic linking framework developed in Task 5.3 on linking data, infrastructure and in particular the underlying network will be used to guide these decision procedures.

Ultimately, based on the relevant task (7.2) of the ENVRI+ project, we will need to:

  1. Provide an effective mapping between research-level quality attributes (ease-of-use, responsiveness, workflow support) to infrastructure-level quality attributes on computing, storage and network services provided by underlying e-infrastructures.
  2. Define test-bed requirements for software and services, and identify conditions for operating final software and services inside each domain, and between multiple domains.
  3. Extend and customise existing optimisation mechanisms for computing and storage resources, and provide an effective control model between processes of data analysis and the underlying e-infrastructure resources, making the application performance as easy as possible to control at runtime.

Thus the purpose of the technology review in ENVRI+ from the optimisation perspective is to determine two things:

  1. What the RI projects already have at their disposal for effective data access, delivery and processing.
  2. What facilities are offered by current e-infrastructures that can meet their processing and optimisation requirements.

The optimisation section of the ENVRI+ technology review focuses on the second point above; the first point should be addressed in other sections, particularly data processing.

A review of the e-infrastructure developments and technologies that can potentially address the data access, delivery and processing requirements of research infrastructures in a more effective (optimal) manner. Should be considered in conjunction with the processing technologies already implemented or in development by RI projects.

Change history and amendment procedure

The review of this topic will be organised by  in consultation with the following volunteers: Type @ followed by first letters of person's name. They will partition the exploration and gathering of information and collaborate on the analysis and formulation of the initial report. Record details of the major steps in the change history table below.For further details of the complete procedure see item 4 on the Getting Started page.

Note: Do not record editorial / typographical changes. Only record significant changes of content.

DateNameInstitutionNature of the information added / changed
3/1/2016UvAProvided introduction, context and scope for optimisation topic.
    

Sources of information used

Places you have gone to for information e.g., RDA; standards bodies; GEOSS; existing RIs and projects and the technologies they use; technologies available from service providers (0.5 pages).

Two-to-five year analysis

Of state of the art and trends based on sources and experience. A distillation of surveyed information leading to your conclusions and recommendations about what should be done in ENVRIplus (0.5 - 1.5 pages) structured internally as appropriate for the topic but with at least the following headings.

State of the art

One paragraph describing the current state of the art.

Trends

Subsequent headings for each trend (if appropriate in this HL3 style)

Paragraph(s) describing a trend(s).

Problems to be overcome

Exploiting virtual (cloud) resources effectively

Conscripting elastic virtualised infrastructure services permits more ambitious data analysis and processing workflows, especially with regard to 'campaigns' where resources are enlisted only for a specific time period. Resources can be acquired, components installed, and processes executed with relatively little configuration time provided that the necessary tools and specifications are in place. These resources can then be released upon the completion of the immediate task. However in the research context, it is necessary to minimise the oversight and 'hands-on' requirement for researchers, and to automate as much as possible. This requires specialised software and intelligent support systems; such software either does not current exist, or operates still at too low a level to significantly reduce the technical burden imposed on researchers, who would presumably rather concentrate on research rather than programming.

Sub-headings as appropriate in HL3 style (one per problem)

A short paragraph describing a problem to be overcome or barrier in the way of progress. 

Details underpinning above analysis

Please supply here any additional information that can help to justify the previous section e.g., references to material that someone can look up for themselves.  

Sketch of a longer-term horizon

e.g., 5-10 years ahead. Your best judgement about the future direction of technology and research trends (0.5 - 1 page).

Relationships with requirements and use cases

Link your analysis of the topic with particular identified requirements and use cases, as this will increase the relevance and help others understand your insights. Consider using tables to do this (0.5 - 1 page).

Summary of analysis highlighting implications and issues

 

  1. It is possible to automate large portions of research activity—however this is contingent on there being good formal descriptions of data and processes, and on there being good tool support for initiating and informing the automated procedures with regard specific experiments and applications.

Bibliography and references to sources

Insert numbered list of sources / references.