Return to ENVRI Community Home![]()
QUESTIONS
Version 2: 18 August 2015
Cristina-Adriana Alexandru, Rosa Filgueira Vincente,
Alex Vermeulen, Keith Jeffery, Thomas Loubrieu, Leonardo Candela, Paul Martin, Barbara Magagna, Yin Chen and Malcolm Atkinson
Starting remark:
Our RI IS-ENES runs a distributed, federated data infrastructure based on few (3-4) main data centres and various associated smaller ones.
With our answers in the following we refer to the climate modeling community, to two data dissemination systems (ESGF for project run time; LTA as long term archiving), to CMIP5 as climate modelling data project 2010-2015 and CMIP6 2016-2021.
ENVRIplus topics: (ENVRIplus development areas address the following topics):
1. Identification and citation
2. Curation
3. Cataloguing
4. Processing
5. Provenance
6. Optimization
7. Community support
A. Generic questions (for all topics)
1. What is the basic purpose of your RI, technically speaking?
a. Could you describe one or several basic use-cases involving interaction with the RI that cover topics 1-7
Use case 1: Data producers submit model data results to IS-ENES data nodes. Data is quality-checked and published in IS-ENES/ESGF data infrastructure. All data items are uniquely identified. Data is long term archived. Data aggregates (experiment level) are assigned DOIs. DOIs are used by end users in scientific publications. DOI-assigned data aggregates are published in various Metadata Catalogues e.g. in world data centers for climate.
Use case 2: End user of IS-ENES data infrastructure encounters problems (technical or scientific). End user contacts IS-ENES/ESGF user support (organized in first/second level support, second level support internationally distributed). General problems are documented in FAQ site.
Use case 3: End user wants to process large amounts of data. Three possibilities to do this:
A) Download and process at home institute. This is supported via a bulk data download and synchronization tool for IS-ENES/ESGF sites.
B) Contact a large IS-ENES/ESGF site who already has the required data available (replicated from other sites) and process there (personal interaction necessary to get account and permission at the site). This is supported by the user support service.
C) Contact a web processing service or a portal providing (parts of) the requested analysis functionalities. This supported by the IS-ENES climate4impact portal (https://climate4impact.eu/) as well as by IS-ENES/ESGF web processing services (not yet fully in production)
Some more detailed IS-ENES use cases were submitted to the RDA (Research Data Alliance) Data Fabric interest group as well as Data Repository interest group and are available at:
https://rd-alliance.org/enes-data-federation-use-case.html and https://rd-alliance.org/climate-data-analytics-use-case.html .
b. Could you describe how data is acquired, curated and made available to users?
Data is generated by climate modeling groups (as well as by some climate observational studies, relevant for climate model intercomparison projects). Data is post-processed according to the standards and agreements of the intercomparison project (e.g. CMIP, CORDEX). Data is ingested at IS-ENES/ESGF data nodes and quality-controlled (check intercomparison project conventions and standards). As a next step, data is published to the IS-ENES/ESGF data infrastructure. Publication makes metadata available and searchable and data accessible via IS-ENES portals (as well as via APIs). Important data products are replicated to dedicated long-term archival centers. There, additional quality checks are run as a pre-requisite for DOI assignment and availability for DOI based data citation.
c. Could you describe the software and computational environments involved?
The post-processing of the data according to the standards and conventions of intercomparison projects is supported by community tool (CMOR). The infrastructure is based on a large international open source community (The Earth System Grid Federation, ESGF), developing the individual components (security, catalogues, data access services, portal parts etc.). The computational environments are more heterogeneous and organized locally at sites according to site-specific constraints. Some computational facilities are integrated as part of the ESGF nodes and portals (simple sub-setting and visualization) or IS-ENES portals interfacing with the IS-ENES data infrastructure (e.g. the climate4impact portal). Larger computational services are exposed via Web Processing Services –this part is not yet in production and needs technical developments as well as future organizational/policy agreements.
d. What are the responsibilities of the users who are involved in this use case?
Data producers:
Deliver data (and metadata) according to the rules and regulations of the corresponding Model Intercomparison Project (defining a kind of data management plan).
Inform data publishers about new versions and versioning related information.
Data publishers:
Publish data according to defined “best practices” agreed upon in the data federation.
Provide contact information in case of operational problems at the site.
Inform federation about operational issues (down times etc.).
Data users:
Provide citation information in published work based on the data.
e. Do you have any use case involving interactions with other RIs (in the same or different domains?)
Data from IS-ENES is replicated to EUDAT for data curation purposes (long term archival). IS-ENES data is harvested by EUDAT metadata catalogue (B2Find). Integration of other EUDAT services is foreseen (B2Drop, ..) to support cross-community data usage.
f. What datasets are available for sharing with other RIs ? Under what conditions are they available?
Mostly model data generated to enable Model Intercomparison Projects: e.g. CMIP5, CORDEX. Also some observational data used for intercomparison analysis activities: e.g. obs4Mips. The diversity will grow during the next phase of intercomparison projects currently starting (CMIP6).
2. Apart from datasets, does your RI also bring to ENVRIplus and/or other RIs:
a. Software? In this case, is it open source?
All the components of the IS-ENES/ESGF data infrastructure are based on an international open source effort, called Earth System Grid Federation (ESGF). All the software is open source ( https://github.com/esgf ). Also the activities to provide future data near processing functionalities are organized in open source projects (see e.g. https://github.com/bird-house with documentation on http://birdhouse.readthedocs.org/en/latest/ as well as the climate4impact WPS activities).
b. Computing resources (for running datasets through your software or software on your datasets)?
No, except for testing & prototyping.
c. Access to instrumentation/detectors or lab equipment? If so, what are the open-access conditions? Are there any bilateral agreements?
No (as n/a).
d. Users/expertise to provide advice on various topics?
On request we are happy to support and provide advice on basis of our running environment (ESGF).
e. Access to related scholarly publications?
No.
f. Access to related grey literature (e.g. technical reports)?
We support a website with information on our RI: https://is.enes.org
3. What objectives would you like to achieve through participation to ENVRIplus?
Better understanding of interdisciplinary use cases and end user requirements.
A look on practices beyond the horizon of our community.
Example:. sharing of data management best practices.
4. What services do you expect ENVRIplus technology to provide?
Service and Data catalogues for comparison of our model data to other data (e.g., observations).
5. What plans does your RI already have for data, its management and exploitation?
a. Are you using any particular standard(s)?
Community specific standards for data formatting and access are used:
netcdf-CF (climate and forecast conventions), OpenDAP data access protocol, Thredds. Metadata is also (partially) exposed as ISO 19139 conforming documents.
b. Are you using any particular software(s)?
Federated Solr/Lucene indices to provide consistent data search across IS-ENES portals.
Thredds servers for data access (developed by unidata). Globus GridFTP for large data transfers.
c. Are you considering changing the current:
No.
The software is in continuous evolution especially because of security incidents in the past. Work in progress concerns among others a better automatic installation. The software is composed of a galaxy of components. Depending on requirements (scientific or operational) we have opportunities to make evolution on some components.
Because of the problems to provide stable operational procedures across an internationally distributed data federation (supported via different (local) funding streams) an operations team was formed to support CMIP6 data management in the data federation. This team will define best practices and supervise the operational data management activities at the sites.
as part of a future plan?
Please provide documentation/links for all the above which apply.
Operations Team terms of reference document: https://docs.google.com/document/d/1oRWqxtWWEfsucTVhk0G3bMqHC0BL4dJwADrOG8Ukj-g/edit
6. What part of your RI needs to be improved in order:
a. For the RI to achieve its operational goals?
Be able to share best practices as fast as new nodes integrate the RI federation.
b. For you to be able to do your work?
Data near processing functionality has to be provided to A) reduce the download volumes from sites and B) support end users a means to be able to work with a worldwide-distributed climate data archive in the Petabyte range.
7. Do topics [1-6] cross-link with your data management plan?
a. If so please provide the documentation/links
See e.g. CORDEX data management plan , CMIP6 data management preparation documents .
8. Does your RI have non-functional constraints for data handling and exploitation? For example:
a. Capital costs
b. Maintenance costs
c. Operational costs
d. Security
e. Privacy
f. Computational environment in which your software runs
g. Access for scrutiny and public review
If so please provide the documentation/links
The total annual operating cost of the infrastructure is estimated to be of 1560 k€.
9. Do you have an overall approach to security and access?
Yes – The data infrastructure supports single sign on across multiple portals as well as authorization based on membership to various “projects”.
10. Are your data, software and computational environment subject to an open-access policy?
CORDEX data are in general available for both commercial and research purposes. Some modelling centres decided to restrict the use of their data to “non-commercial research and educational purposes.”
https://github.com/IS-ENES-Data/cordex/blob/9fa582a72c38ad13738885c1aeadc764bc3700fa/CORDEX_register.xlsx
The access to CMIP5 data is unrestricted except for the data from Japanese modeling centres, which are subject to similar restrictions as above:
http://cmip-pcmdi.llnl.gov/cmip5/availability.html
11. What are the big open problems for your RI pertinent to handling and exploiting your data?
Handling Volume and distribution of data (Multi-Petabyte range): Replication, Versioning.
Providing related information for data products (provenance, user comments, usage, detailed scientific descriptions needed for usage).
12. Are you interested in any particular topic [1-6] to discuss in more detail?
a. If so, would you like us to arrange a follow up interview with more detail questions about any particular topic to be discussed?
Topic 4 (processing). – see below.
Optional: If you are not the right person to reply to some questions from the above, please suggest the right person to contact from your RI for those questions.
B. Specific questions per topic
i. What granularity do your RI’s data products have:
We store a time series of each variable in a simulation run at given sampling frequency (yearly, monthly, day, sub-daily). Spatially we cover 1) the globe by gridpoints 2) regions like Europe, Africa…
ii. How are the data products of your RI stored - as separate “static” files, in a database system, or a combination?
Metadata catalogue; data on disk by variable (ESGF), some data on tape (LTA).
iii. How does your RI treat the “versioning” of data - are older datasets simply replaced by updates, or are several versions kept accessible in parallel? How do you identify different version of the same dataset?
ESGF: several versions are kept in parallel on some reference nodes. Versions applied at the dataset level and contain several files pertaining to given variable or set of variables. New version are store in new directory. LTA: Version info is part of MD.
iv. Is it important to your data users that
Yes.
Yes, it does already.
Some Metadata only.
Yes.
v. Is your RI currently using, or planning to use, a standardized system based on persistent digital identifiers (PIDs) for:
n/a
n/a
Yes.
Yes.
vi. Please indicate the kind of identifier system that are you using - e.g. Handle-based (EPIC or DOI), UUIDs or your own RI-specific system?
EPIC and DOI.
vii. If you are using Handle-based PIDs, are these handles pointing to “landing pages”? Are these pages maintained by your RI or an external organization (like the data centre used for archiving)?
Landing pages maintained by DKRZ.
viii. Are costs associated with PID allocation and maintenance (of landing pages etc.) specified in your RI’s operational cost budget?
Yes.
i. How does your “designated scientific community” (typical data users) primarily use your data products? As input for modelling, or for comparisons?
As climate model input, for analysis and for comparison.
ii. Do your primary user community traditionally refer to datasets they use in publications:
DOIs are available for the most important data products like CMIP5 and CORDEX. Data is ready to be cited in the reference section, but it is not yet usual to do so.
iii. Is it important to your data users to be able to refer to specific subsets of the data sets in their citation? Examples:
We recommend citing a dataset collection and specifying the used subset in the text. The above-mentioned subsets are possible in any combination as well as combining specific subsets over multiple dataset collections i.e. citation entities.
iv. Is it important to be able to refer to many separate datasets in a collective way, e.g. having a collection of “all data” from your RI represented by one single DOI?
See iii: A collection is suitable to be used in a reference list to keep the balance between data and paper citations.
i. What strategy does your RI have for collecting information about the usage of your data products?
Scientific “impact”
Downloads/access requests: by number and volume with continental information on user origins (for DKRZ visualised on the DKRZ-Website).
References in scientific literature, Scientific “impact”: establish data references as part of the scientific record.
ii. Who receives credit when a dataset from your RI is cited?
The creator(s) as specified by the data originator; creators might be persons or institutions.
What steps in tooling, automation and presentation do you consider necessary to improve take up of identification and citation facilities and to reduce the effort required for supporting those activities?
Not mentioned above is the identification of creators by PIDs like ORCID or the relation/connection to a scientific publication. Earth System Sciences data is of high volume; therefore data is hosted at established archival centers. Certificates like Data Seal of Approval (DSA) and World Data System (WDS) approval are of growing importance. Usually we have so-called ‘stand-alone’ data publications not directly connected to or supplemented to an article. Most of the data users publishing articles are not identical with the data creators.
We currently work on a stable and reliable possibility to cite dynamic data (CMIP6) in a federated data infrastructure.
a. Will the responsibility for your RI’s curation activities be shared with other organisations?
It is already a shared approach between various climate data centres and climate modelling centres
b. Does the curation cover datasets only or also:
i. Software?
ii. Operating environment?
iii. Specifications/documentation?
All cases: also (ii only meta information on environment).
c. What is your curation policy on retaining/discarding
i. Datasets?
Final project data >10yrs
ii. Software?
Mostly no policy
iii. Operating environments?
Mostly no policy
iv. Documents?
When needed – actually we do not always have the time to keep everything up to date.
d. How will data accessibility be maintained for the long term? E.g. What is your curation policy regarding media migration?
RI policies depend on the specific policies of the service centers providing LTA services: e.g. New tapes after 5 years at DKRZ
e. Do you track all curation activities with a logging system?
At DKRZ most activities are logged but not systematically. It depends on the site in question.
f. What metadata standards do you use for providing
i. Discovery,
ISO, DIF, DC, THREDDS.
ii. Contextualisation (including rights, privacy, security, quality, suitability...)
Strongly depending on the project
iii. Detailed access-level (i.e. connecting software to data within an operating environment)?
n/a
Please supply documentation
g. If you curate software how do you do it? Preserving the software or a software specification?
SW by SVN/GitHub – ad hoc storage by some data centres.
h. What provisions will you make for curating workflows or other processing procedures/protocols?
Storing provenance logs as part of workflow outputs is foreseen for some data evaluation workflow chains.
i. If you curate the operating environment how do you do it? Preserving the environment or an environment specification ?
Just specifications like OS, compiler, hardware, libraries, etc.
j. What steps in tooling, automation and presentation do you consider necessary to improve take up of curation facilities and to reduce the effort required for curation?
Better unification of policies & interfaces and better adhering to it.
a. Do you use catalogues or require using catalogues for the following items?
i. Observation system
ii. Data processing system
iii. Observation event and collected sample
iv. Data processing event
v. Data product
Metadata catalogue for specification of data products.
vi. Paper or report product
vii. Research objects or feature interest (e.g. site, taxa, ...)
viii. Services (processing, discovery, access, retrieval, publishing, visualization, etc.)
b. For each used or required catalogue, consider the following questions:
i. Item descriptions:
ESGF: Use MD, in preparation: Citation MD
LTA: Use MD, Citation MD, Contacts, Rights, access&storage, in preparation: provenance.
ISO, DIF, DC, etc.
ESGF: NetCDF-CF ( www.cfconventions.org ) and lists in central repository (remake is in progress, github.com/ES-DOC/esdoc-cim-cv and ES-DOC/esdoc-cv and other)
LTA: Just internal lists.
Working on cross-link between data and simulation-Metadata, Model-Metadata.
LTA: cross-links to various publications (with DOI).
CMS = plone, cKAN.
LTA: Oracle DB, JavaSP.
ESGF: Lucene indexing, postgres DB.
ii. Inputs:
LTA: Oracle SQLdeveloper.
ESGF: 1) auto Metadata harvesting from netCDF file headers to DB; 2) Lucene solr cloud for Metadata aggregation and presentation to user; 3) external harvesting from DB possible.
Checksums and unique IDs.
iii. Human outputs:
ESGF: facetted search.
No.
If so, please describe them shortly.
All Metadata are free and open.
iv. Machine outputs:
Metadata: OAI-PMH: ISO, DC, DIF, etc.
Partially e.g. to provide Metadata as part of the world data center federation
a. Data processing desiderata: input
i. What data are to be processed? What are their:
Hierarchical collection of data, are characterized by entries from controlled vocabularies.
Very high volume data, size of individual files ranges from mega to gigabytes, but processing is normally done at collection level involving multi terabyte input collections.
Low velocity – data collections are growing (in a controlled manner) and new versions of existing data products are available in the data federation
Very low – data is based on highly structured data items (well defined binary data types, representing multi-dimensional date entities, e.g. netCDF). Data entities are organized in well structured hierarchies (structured according to time, variables, project characterization etc.).
ii. How is the data made available to the analytics phase? By file, by web (stream/protocol), etc.
Normally the data is made available to analytics based on a local or mounted file system. A separate data-import step is responsible for filling up the input data pool.
iii. Please provide concrete examples of data.
Temperature and precipitation according to various scenarios, generated by different climate models. Compute statistics to compare characteristics of the different climate models or climate indices – characterizing the individual climate model performance.
b. Data processing desiderata: analytics
i. Computing needs quantification:
Highly dependent on use case. Computing is more I/O bound then processor bound.
Also very use case dependent, some multi-model analytics may run for days on a small cluster – others for minutes – as before: time is more dependent on data access characteristics.
Most processing would benefit from a parallel map reduce phase, where first distributed data near pre-processing is done, reducing the amount of data to be transferred. Thereafter more complex, shared disk/memory parallel analytics is done on the parts from the map-reduce phase.
Some analysis use cases can benefit from shared memory and distributed memory parallelism to accelerate time to solution. Also to notice: some analysis phase are well suited for parallel approach (such as one process per model for example)
ii. Process implementation:
Python, R, C, C++, Fortran
Linux clusters, mostly open source software basis
Data near processing for ENES/ESGF sites is based on the OGC WPS standard.
Yes – by contributing to open source data analytics software projects of various kinds (UV-CDAT, birdhouse, ESMValTool,etc).
Yes – concrete test procedure is project dependent.
iii. Do you use batch or interactive processing?
Both.
iv. Do you use a monitoring console?
Yes.
v. Do you use a black box or a workflow for processing?
The choice of workflow engine analysis is project or framework specific, e.g. proprietary workflow wrappers, dispel4py.
Analysis project dependent, mostly not.
vi. Please provide concrete examples of processes to be supported/currently in use;
Simple: Subsetting of data, mean etc. statistics, downscaling of data, interpolation of data, climate indices calculation (ENSO, NAO, PDO, etc).
-Complex: vegetation modeling, geographical mosquito dispersal
c. Data processing desiderata: output
i. What data are produced? Please provide:
Various: netCDF files, graphics, text, logs
Various: normally orders smaller then input data
High – depending on analysis activity
High
ii. How are analytics outcomes made available?
Different means: only per researcher or research group on file system, some outputs are published in catalogues and accessible via web and python notebook for example.
d. Statistical questions
i. Is the data collected with a distinct question/hypothesis in mind? Or is simply something being measured?
Data is collected according to the requirements and pre-defined characteristics defined for climate model intercomparison projects.
e. Will questions/hypotheses be generated or refined (broadened or narrowed in scope) after the data has been collected? (N.B. Such activity would not be good statistical practice)
The requirements and characteristics are refined after every round of model intercomparison projects to improve the next round and to be react on the new possibilities new technical infrastructures provide (e.g.improved processing power to support larger ensembles and finer resolution in models).
f. Statistical data
i. Does the question involve analysing the responses of a single set of data (univariate) to other predictor variables or are there multiple response data (bi or multivariate data)?
Depending on analysis activity.
ii. Is the data continuous or discrete?
Discrete.
iii. Is the data bounded in some form (i.e. what is the possible range of the data)?
Data represents several hundreds physical quantities (temperature, precipitation, wind speed, etc.) and in that sense are bound by physical laws.
iv. Typically how many datums approximately are there?
Data are stored on grid point covering the entire Earth System influencing the climate (atmosphere, ocean, sea ice, land … ). So there are several thousands of data.
g. Statistical data analysis
i. Is it desired to work within a statistics or data mining paradigm? (N.B. the two can and indeed should overlap!)
Statistics are very important in climatic analysis as we are looking after robust and significant signal.
ii. Is it desired that there is some sort of outlier/anomaly assessment?
Yes – but on a Petabyte scale difficult to achieve.
iii. Are you interested in a statistical approach which rejects null hypotheses (frequentist) or generates probable belief in a hypothesis (Bayesian approach) or do you have no real preference?
Needs more details. Yes a priori. The range of scientific analysis done using the data of our RI is very large. But those complex analyses are usually done within the scientific teams not by the RI itself.
a. Do you already have data provenance recording in your RI?
Yes, depending on the data analysis activity
If so:
b. Where/when do you need it, e.g., in the data processing workflows, data collection/curation procedures, versioning control in the repositories etc.?
Mostly in data collection procedures as well as data processing workflows
c. What systems are you using?
Community tools e.g. to manage what has been collected from where, and what is the overall transfer status or e.g. to generate provenance log files in workflows.
d. What standards are you using?
i. Advantages/disadvantages
No standard by now, first experiments toward the use of PROV-O in a specific analysis project.
ii. Have you ever heard about the PROV-O standard?
Yes.
e. Do you need provenance tracking?
i. If so, which information should be contained?
Input data characteristics (names, characterizing facets, checksum, unique ids), tools used (git svn tags), output files, timing information, platform/environment information.
f. What information do you need to record regarding the following:
i. Scientific question and working hypothesis?
The data has been produced following a very details experimental protocol. We need to collect all the information needed to assess how exactly the protocol has been followed (facets, control vocabulary, documentation : es-doc.org).
ii. Investigation design?
Authors information.
iii. Observation and/or measurement methods?
iv. Observation and/or measurement devices?
v. Observation/measurement context (who, what, when, where, etc.)?
vi. Processing methods, queries?
vii. Quality assurance?
Performed quality assurance procedures, results of QA software.
g. Do you know/use controlled vocabularies, e.g. ontologies, taxonomies and other formally specified terms, for the description of the steps for data provenance?
Not yet.
h. What support, e.g. software, tools, and operational procedures (workflows), do you think is needed for provenance tracking?
Agreements on what information to record and simple APIs to be able to be integrated in analysis tools and frameworks.
i. How does your community use/plan to use the provenance information?
-For catalogues as additional metadata for data products.
-For end users to understand the derivation history of data products.
-For tools to automatically “replay” specific analysis parts.
i. Do you have any tools or services in place/planned for this purpose?
No generic ones – specific loggers, etc.
13. Related to your answer to the generic question 7 (What part of your RI needs to be improved):
i. What does it mean for this to be optimal in your opinion?
-Easy, standardized interfaces for command line usage as well as portal integration ,
-Faster, more robust and fully automated replication procedures. Fast replication across continents is key to accelerate data access at an early stage of a major project.
-Policies etc. for assignment of compute resources to user (groups)
-Funding for community computing resources.
ii. How do you measure optimality in this case?
RIs have a set of KPIs that will progress if those areas are improved.
End user satisfaction. Number of publications should progress f aster than before if we progress following those directions.
iii. Do you already know what needs to be done to make this optimal?
A rethink is necessary on one hand to get the end users (“data analysts”).
iv. What would you not want from an 'optimal' solution? For example, maximizing one attribute of a component or process (e.g. execution time) might come at the cost of another attribute (e.g. ease-of-use), which ultimately may prove undesirable.
Due to the amounts of data, we would not like to lower network performance. Also fundamental is the “ease of use” of the RI by scientists and by engineer.
b. Follow-up questions to answers from other sections which suggest the need for the optimization of certain RI components.
Data citation is currently not an easy task because our data collections are extremely complex. We can progress along that line.
c. Do you have any use case/scenarios to show potential bottlenecks in 1) the functionality of your RI, for example the storage, access and delivery of data, doing processing, handling the workflow complexity etc. 2) ensuring the non-functional requirements of your RI, for example ensuring load balance in resource usage etc.
Ensuring load balance when computing services will be made widely available will be a challenge. Network resources are also a potential bottleneck because of the data volume we are dealing with.
d. To understand those bottlenecks:
i. what might be the peak volume in accessing, storing, and delivering data?
Previous project (CMIP5) had up to about 10 TB over all (mainly 3) European nodes daily. We expect CMIP6 to show significantly higher values.
ii. what complexity might the data processing workflow have?
We presently need to handle rather complex workflows.
iii. Are there any specific quality requirements for accessing, delivering or storing data, in order to handle the data in nearly real time?
No.
We define Community Support as being concerned with managing, controlling and tracking users' activities within an RI and with supporting all users to conduct their roles in their communities. It includes many miscellaneous aspects of RI operations, including for example (non-exhaustively) authentication, authorization and accounting, the use of virtual organizations, training and helpdesk activities.
a. Training Requirements
i. Do you use or plan to use e-Infrastructure technology?
Using Cloud, Grid, HPC, cluster computing, ESGF is e-infrastructure in this sense
ii. What is your community training plan?
Workshops from time to time. Also we communicate inside our communities training course and workshops organized by HPC center or European projects (PRACE, EGI,...).
iii. Does your community consider adopting e-Infrastructure solutions (e.g., Cloud, Grid, HPC, cluster computing).
n/a
iv. Is your community interested in training courses that introduce state-of-the-art e-Infrastructure technology?
Need to see the detail. Potentially yes.
v. What topics (related to e-Infrastructure solutions) would your community be interested in?
Load balancing, compute resources management.
vi. Who would be audience?
Developers and PIs of out RI at the first stage.
vii. What are appropriate methods to deliver this training?
Workshops.
b. Requirements for the Community Support Subsystem:
i. What are the required functionalities of your Community Support capability?
We have AAI, help desks and accountings activities in place.
ii. What are the non-functional requirements, e.g., privacy, licensing, performance?
Good performance for high data volumes. Some data have licensing constraints that will restrict access to a certain group of users.
iii. What standards do you use, e.g., related to data, metadata, web services?
Metadata: ISO, DIF, SAML, REST, DC…
iv. What community software/services/applications do you use?
For AAI we use OAuth2, OpenID, SAML, X509.
For ESGF: LAS, Synda, Birdhouse, netCDF, Thredds…
Stichting EGI에게 부여된 무료 Atlassian Confluence Community License로 실행됩니다. 오늘 Confluence를 평가해 보세요.