페이지 트리

버전 비교

  • 이 줄이 추가되었습니다.
  • 이 줄이 삭제되었습니다.
  • 서식이 변경되었습니다.

Introduction

<Insert here a brief introduction to this topic>

<Introduction to the questions asked pertaining to general / pervasive requirements and setting the context of topic-­specific requirements. Collation and integration of any pertinent properties and requirements that are consistent across all of the research infrastructures addressed by ENVRIplus requirements gathering.>

<insert here who is responsible for steering and editing this page. But they need to get their go-betweens to agree they have covered the points, e.g. for General requirements>

Image Removed  Maggie Hellstrom and Alex Vermeulen with help from go betweens and others he co-opts.

Overview and summary of identification and citation requirements

<The overview and summary should be written (integrated and distilled) by the topic leader(s), highlighting commonalities and reporting significant variations. It should be refined and agreed by the go-betweens who contributed to this topic. In particular, they should check that critical points have not been missed and that a balance has been attained.>

Research Infrastructures

ENVRIplus “Identification & Citation” requirements gathering status: January 27, 2016

RI

Domain

I done

C done

Comments

ACTRIS

A

Y

Y

Final information recorded in the wiki

ANAEE

B/E

 

 

REQUEST to fill out questionnaire sent via go-between (Paul M) on 2015-12-08;

no response received (status 2016-01-27)

EISCAT-3D

A

Y

Y

The RI submitted completed questionnaire 2015-12-14 - thank you!

ELIXIR

B/E

 

 

 

EMBRC

M (multi)

Y

Y

The RI submitted completed questionnaire 2016-02-08 - thank you!

EMSO

M (multi)

Y

Y

The RI submitted completed questionnaire 2015-12-11 - thank you!

EPOS

G

Y

Y

 

ESONET-VI

M

 

 

?? might be (partially) covered by SeaDataNet responses?

Euro-ARGO

M

Y

Y

The RI submitted completed questionnaire 2015-12-08 - thank you!

EUDAT

e-Infra

--

--

-- (not included in the requirements survey)

EUROFLEETS2

M

 

 

?? might be (partially) covered by SeaDataNet responses?

EUROGOOS

M

 

 

The RI submitted completed questionnaire 2016-01-11 - thank you!

FIXO3

M

 

 

?? might be connect somehow to EMSO??

IAGOS

A

Y

Y

The RI submitted completed questionnaire 2015-12-17 - thank you!

ICOS

A,B/E,M

Y

Y

 

INTERACT

B/E

 

 

 

IS-ENES2

A (multi)

Y

Y

 

JERICO

M

 

 

?? might be (partially) covered by SeaDataNet responses?

LIFEWATCH

B/E

 

 

 

LTER

B/E

Y

Y

The RI submitted completed questionnaire 2016-01-22 - thank you!

SEADATANET

M

Y

Y

Need to clarify representativeness for Marine domain:

contacted go-betweens and RI rep on 2015-12-08;

no response yet as of 2016-01-27, final reminder sent...

SIOS

A,B/E,G,M

(Y)

Y

The RI submitted completed questionnaire 2016-01-10 - thank you!

Domains: A = atmosphere; B/E = biosphere/ecosystems; G = solid Earth; M = marine

 

 

The following RIs contributed to developing identification and citation requirements

<Delete from the following list any that were not able to contribute on this topic>

<Add an interest inducing sentence or two, to persuade readers to look at the contribution by a particular RI. e.g., What aspect of the summary of requirements, or the special cases, came from this RI. Check with RIs that they feel they are correctly presented.>

ACTRIS: <e.g., This RI ... and therefore has XYZ <Topic> requirements, with a particular empahsis on ...>

AnaEE:

EISCAT-3D:

ELIXIR:

EMBRC:

EMSO:

EPOS:

Euro-ARGO:

EUROFLEETS2:

ESONET:

EUROGOOS:

FIXO3:

IAGOS:

ICOS:

INTERACT:

IS-ENES2:

JERICO:

LTER:

SEADATANET:

SIOS:

 

The questions that were sent to the RIs are available here: 1 - Identification and citation questions.docx

The following RIs have contributed to identifying and describing ENVRIplus identification and citation requirements: ACTRISAnaEEEISCAT-3DEMBRCEMSOEPOSEuro-ARGOEuroGOOSIAGOSICOSIS-ENES2LTERSeaDataNet, and SIOS (click on individual RI names to see the respective responses).

Introduction

Identification of data (and associated metadata) throughout all stages of processing is really central in any RI. This can be ensured by allocating unique and persistent digital identifiers (PIDs) to data objects throughout the data processing life cycle. The PIDs allow unambiguous references be made to data during curation, cataloguing and support provenance tracking. They are also a necessary requirements for correct citation (and hence attribution) of the data by end users, as this is only possible when persistent identifiers exist and are applied in the attribution.

Environmental research infrastructures are often built on a large number of distributed observational or experimental sites, run by hundreds of scientists and technicians, financially supported and administrated by a large number of institutions. If this data is shared under an open access policy it becomes therefore very important to acknowledge the data sources and their providers. There is also a strong need for common data citation tracking systems that allow data providers to identify downstream usage of their data so as to prove their importance and show the impact to stakeholders and the public.

Overview and summary of identification and citation requirements

Identification

The survey found a large diversity between RIs regarding their practices. Most are applying file-based storage for their data, rather than data base technologies, which suggests that it should be relatively straightforward to assign PIDs to a majority of the RI data objects. A profound gap in knowledge about what persistent and unique identifiers are, what they can be used for, and best practices regarding their use, emerged. Most identifier systems used are based on handles (DOIs from DataCite most common, followed by ePIC PIDs), but some RIs rely on formalized file names. While a majority see a strong need for assigning PIDs to their “finalized” data (individual files and/or databases), few apply this to raw data, and even fewer to intermediate data - indicating PIDs are not used in workflow administration. Also, metadata objects are seldom assigned PIDs. Costs for maintaining PIDs are typically not treated explicitly.

Citation

Currently, users refer to data sets in publications using DOIs if available, and else provide information about producer, year, report number etc. either in the article text or in the References section. A majority of RIs feel it is absolutely necessary to allow unambiguous references to be made to subsets of data sets, preferably in the citation, while few find the ability to create and later cite collections of individual data sets is important. Ensuring that credit for producing (and to a lesser extent curating) scientific data sets is “properly assigned” is a common theme for all RIs - not the least because funding agencies and other stakeholders require such performance indicators, but also because individual PIs want and need recognition of their work. Connected to this, most RIs have strategies for collecting usage statistics for their data products, i.e. through bibliometric searches (quasi-automated or manual) of from scientific literature, but thus often rely on publishers indexing also data object DOIs.

Conclusion

The use of persistent and unique identifiers for both data and metadata objects throughout the entire data life cycle needs to be encouraged, e.g. by providing training and best-use cases. There is strong support for promoting “credit” to data collectors, through standards of data citation supporting adding specific sub-setting information to a basic (DOI-based) reference.

Research Infrastructures

The following RIs have contributed to identifying and describing ENVRIplus identification and citation requirements: ACTRIS, AnaEE, EISCAT-3D, EMBRC, EMSO, EPOS, Euro-ARGOEuroGOOS, IAGOS, ICOS, IS-ENES2, LTER, SeaDataNet, and SIOS (click on individual RI names to see the respective responses).