An introductory paragraph to explain the topic and to put it into a high-level context i.e., how it relates to the RIs, how it relates to other aspects, the current practice, etc. (0.5-1 pages).
(Scientific Data) Processing or Analytics is a quite vast domain including any activity or process that performs a series of actions on dataset(s) to distil information (add citation SIGMOD Record paper). It is particularly important in scientific domains especially with the advent of the 4th Paradigm and the availability of “big data” (add a citation to 4th paradigm book). Almost any Research Infrastructure is called to deal with some sort of scientific data processing tasks. Data analytics methods are drawn on multiple disciplines including statistics, quantitative analysis, data mining, and machine learning. Very often these methods might require computing intensive infrastructures to produce their results in a suitable time, because of the data to be processed (e.g. huge in volume or heterogeneity) and/or because of the complexity of the algorithm/model to be elaborated/projected. Moreover, these methods being devised to analyse dataset(s) and produce other “data”/information (than can be considered a dataset) are strongly characterised by the “typologies” of such input and output.
This technology review focuses on the following aspects:
The review of this topic will be organised by in consultation with the following volunteers:
. They will partition the exploration and gathering of information and collaborate on the analysis and formulation of the initial report. Record details of the major steps in the change history table below.For further details of the complete procedure see item 4 on the Getting Started page.
Note: Do not record editorial / typographical changes. Only record significant changes of content.
| Date | Name | Institution | Nature of the information added / changed |
|---|---|---|---|
| date | Type @ followed by first letters of person's name | Acronym of institution | A remark on the information added or changed |
Places you have gone to for information e.g., RDA; standards bodies; GEOSS; existing RIs and projects and the technologies they use; technologies available from service providers (0.5 pages).
Of state of the art and trends based on sources and experience. A distillation of surveyed information leading to your conclusions and recommendations about what should be done in ENVRIplus (0.5 - 1.5 pages) structured internally as appropriate for the topic but with at least the following headings.
One paragraph describing the current state of the art.
Paragraph(s) describing a trend(s).
A short paragraph describing a problem to be overcome or barrier in the way of progress.
Please supply here any additional information that can help to justify the previous section e.g., references to material that someone can look up for themselves.
e.g., 5-10 years ahead. Your best judgement about the future direction of technology and research trends (0.5 - 1 page).
Link your analysis of the topic with particular identified requirements and use cases, as this will increase the relevance and help others understand your insights. Consider using tables to do this (0.5 - 1 page).
This section should be suitable for the deliverable and also understandable on its own, without the need to read the rest of the material. A discussion of areas where ENVRIplus should change its plans as a result of your conclusions, and of open questions would be very useful here (0.25 - 0.5 pages)
R. Bordawekar, B. Blainey, C. Apte (2014) Analyzing analytics. SIGMOD Rec. 42, 4 (February 2014), 17-28. DOI=http://dx.doi.org/10.1145/2590989.2590993
T. Hey, S. Tansley, K. Tolle (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research. http://research.microsoft.com/en-us/collaboration/fourthparadigm/