Return to ENVRI Community Home![]()
...
Optimisation requirements gathering is coordinated by with help from go betweens.
...
Many optimisation problems, whether explicitly identified as such by RIs for implicit in the requirements of other topics, can be reduced down to ones of data placement. Is the data need by researchers available in a location from which it can be easily identified, retrieved and analysed, in whole or in part? Is it feasible to perform analysis on that data without substantial additional preparation, and if not, what is the overhead in time and effort required to prepare the data for processing? This latter question relates to the notion of data staging, whereby data is placed and prepared for processing on some computational service (whether provided on a researcher's desktop, an HPC cluster or a web server), which in turn concerns the further question of whether data should be brought to where they can be best computed, or computing tasks brought to where the data currently reside. Given the large size of many RI's primary datasets, bringing computation to data is appealing, but the complexity of various analyses also often requires supercomputing-level resources, which require the data be staged at a computing facility such as are brokered in Europe by PRACE. Data placement is reliant however on data accessibility, which is not simply based on the existence of the data in an accessible location, but is also based on the metadata associated with the core data that allows it to be correctly interpreted; it is based on the availability of services that understand that metadata and can so interact (and transport) the data with a minimum of manual configuration or direction.
Reductionism aside however, the key performance indicator used by most RIs is researcher productivity. Can researchers use the RI to efficiently locate the data they need? Do they have access to all the support available for processing the data and conducting their experiments? Can they replicate the cited results of their peers using the facilities provided? This raises yet another question: how does the service provided to researchers translate to requirements on data placement and infrastructure availability?
This is key to intelligent placement of data—the existence of constraints that guide (semi-) autonomous services by conferring an understanding of the fundamental underlying context in which data placement occurs. The programming of infrastructure in order to support certain task workflows is a part of this.
Good provenance is fundamental to optimisation—in order to be able to anticipate how data will be used by the community, and what infrastructure should be able conscripted to provide access to those data, it is necessary to understand as much about the data as possible. Provenance is required to answer who, what, where, when, why and how regarding the orgins of data, and the role of an optimised RI is to rknow the answers for who, what, where, when, why and how regarding the future use of data. Ensuring that these questions can be asked and answered becomes more challenging the greater the heterogeneity of the data being handled by the RI.
...