...
- A URL where treatment of this topic can be found (when it is available) i.e, link to another page in this wiki
- Malcolm Atkinson
- 18 January 2016
- Data-Intensive federations (DIF) are formed to enable practitioners to have easier access to dynamic and evolving data that is owned and provided by multiple independent organisations some of whom may be partners in the DIF. A DIF needs to be long-lived to enable its many users to depend on its services. During its lifetime the provisions, priorities, data organisation and services of the data providers will evolve, as will the requirements and activities of its user community. The target of gaining benefit from the improvements in data acquisition, data preparation and data curation happening contemporaneously in provider organisations and the requirement to handle dynamic data so that uses can have response horizons vary from almost immediate to very long term, differentiat DIF from Digital Asset Management (DAM), which helps practitioners develop static collections of under their own control. An effective DIF delivers a holistic and comprehensible view of the relevant data to its users, it facilitates the specification and application of dynamic data integration strategies and it permits effective working with a wide variety of data analysis systems, problem-solving and development tools and all of the functional and non-functional aspects of a DAM. In particular it supports a Virtual Research Environment (VRE), which is underpinned by a Virtual Organisation (VO) that administers identity, membership of groups, allocation of roles, and hence of authority to use data and resources. The implementation needs to implement these rules with proper security and accounting across the provider and partner organisations. The DIF will offer computer-supported collaborative working (CSCW), e.g. sharing workspaces, collaborating on developing scientific methods, data handling processes, agreed data organisations, vocabularies and so on, with specified scope. However, these will include extend to dynamic access, handling and integration processes in the DIF context that are designed to be reused repeatedly on demand. To achieve sustainability, most of this work should be constructed at an abstract level not bound to underpinning technologies, platforms and computational resource provisions. The scientific gateway should dynamically map user actions (often performed using tools that call the science gateway's API) onto executable data-intensive workflows or distributed queries that are deployed across the distributed infrastructure to deliver the required result. With sufficiently high-quality descriptions of the platforms, components, services and data these mappings can be largely automated. It is necessary to develop a good architecture for DIF, that is reusable for many DIF. Otherwise DIF (and even DAM) will become unsustainable as their context evolves and as the number of pairwise interactions between components grows at an order N2 rate.
- Related topics: Digital Asset Management (DAM), Data Integration, Virtual Research Environment (VRE), Computer-Supported Collaborative Working (CSCW), Virtual Organisations (VO), Science Gateway, Dynamic Mappings, Data-Intensive Platforms, Data-Intensive Workflows. Distributed Queries.
- The current status: (a) requires investigation, (b) will be investigated, (c) is being investigated, or (d) has been reported.
- For b, c or d above the person(s), group, ENVRIplus task or WP that is undertaking the work. Malcolm Atkinson, Alex Hardisty and Keith Jeffery
- Its category, from the list above, or a new category name, which should be gathered under other topicsArchitecture.
- Its depth, from area, technology, implementation or application.
- Its form, e.g. Protocol, Distributed framework, SW system, SW library, HCI framework, Tool, metadata framework, ontology, standard, authoritative report, ... Iterative development of an architectural style and at least one candidate implementation
- Where relevant its maturity, e.g. TRL for software: Novel proposal needs attention
- Its potential source: being investigated
- Why this topic is critical and needs attention: Without it, work on e-Infrastructure to support RIs and on tools to aid all of their practitioner roles will run into a complexity barrier and become unsustainable. This is particularly the case where some of the data used is collected and used for other social, political or commercial purposes, and the RI needs to sustain effective working with provider organisations have other priorities.
Linking model
Reference model
...
{"serverDuration": 62, "requestCorrelationId": "f8e7587a2b63aa22"}