Complete ACTRIS report on Identification and Citation available at: https://envriplus.manageprojects.com/projects/requirements/notebooks/470/pages/36/comments/384/attachments/589/download
Insert a summary of the main requirements for this RI for the current topic. Point out any unusual features, and comment on the extent to which these requirements are fixed or evolving. |
Identification
The ACTRIS data granularity is content-wise, temporally and spatially. Data is separated by parameters, so each data product contains information about 1 or 2 parameters, which means that they have a lot of files. However, they are planning to have combined data products with all parameters integrated, to reduce the content-wise granularity. The temporally resolution of data, depends of the data product. For example, they have data products in near real-time (currently implement for CLOUDNET but is planned for the other 2 components too), yearly, seasonal, daily, and hourly data. Regarding the spatial granularity, data are separated by station. In the future, ACTRIS might have data by region.
It is worth to mention that, although ACTRIS have different data products, currently they are working to have combined products.
ACTRIS deals with the versioning replacing older datasets by updates. Often, they announce to the data products’ users the new update of a dataset and the main differences with the previous version. Once the new update of data is announced, the old version of the data is only available by request. However, the new version is available online. And for identifying different versions of the same data, the strategy used is to name the files differently (typically using the release date).
ACTRIS was not aware about PID, therefore they do not use for their data nor meatadata. However, they have a nomenclature for naming the files, so they know exactly which kind of information is stored in each file. For example, for EARLINET the name of the files are: Station’s code+ date+ parameter+time.
The metadata contains checksum information for the objects. And some information about the metadata is available from the ACTRIS’s website. For example, for the stations they have a list with the code of stations (two characters identifiers). Besides, in the description of the databases, ACTRIS has stored the metadata, which is also available in the website. The data and metadata can be linked (but no with PIDs) by the name of the files.
ACTRIS delegates to the data generators the task of storing and archiving the data, so they have the decision to use a standardized system or not. In the case of the physical samples, it depends on the component. CLOUDNET component uses the instruments serial number, while near surface (EUSAAR) component, where they have filters for collecting the particles; they don’t have any tracking number. And LIDAR component uses a handbook, where they store the characterization of all the channels of the instruments, etc., which is very important for the analysis of the data. This information is stored in EXCEL files, with all the information inside. For data undergoing processing, QA and QC procedures are applied /to be developed for the three components. The 3 components have a processing chain which foresees the release in a 2nd step of QC data. And the quality checked data is available with a DOI. Finally, the publishable data is linked with a DOI and stored in the CERA database.
ACTRIS uses ‘station code’ and DOI as identifier systems. Those DOIs are maintained by an external organization called CER [1], based at Hamburg. ACTRIS does not allocate budget for this.
Citation
ACTRIS scientific community use the data differently depending if they belong to the RI or not. Internal users use the data for understanding what is happening, since some times they need to study the data coming from other stations. On the other hand, external users, like Modelling community, use the data for evaluating models.
For referring to datasets in different publications, users normally cited them by using DOI in the Reference section (preferred) and/or by providing the data as supplementary information via a link. Furthermore, sometimes authors offer co-authorship of papers. It is important to mention that, when a ACTRIS dataset is cited by using DOI, all the DOI’s authors receive the credits.
For some ACTRIS data users, it is important to refer to specific subsets. And ACTRIS recommends to use DOI, since it can refer to some subsets. For example, for calibration purposes they have a subset called Calipso, and users can refer to that dataset. They also have a subset for Volcanic Eruptions.
ACTRIS has already a DOI collection of all data for EARLINET [2]. However, CLOUDNET and EUSAAR components are working for having DOI related with their dataset.
The ACTRIS’ strategy for collecting information about the usage of their data products is based on counting the number of download/access, references in scientific literature, and by measuring the scientific impact.
ACTRIS considers important to have a good quality check of the data (which is currently time consuming) before obtaining the DOI. Therefore, the implementation of some automation of quality check will improve the process of getting DOI (planned within ACTRIS2 project). Besides, for getting DOI, they need to accomplish some standards, and sometimes those standards changes, making difficult to follow them.
| Go-between | Rosa Filgueira |
|---|---|
| RI representative | Lucia Mona and Markus Fiebig |
| Period of requirements collection | July to November 2015 |
| Status | Finished |
Add additional rows to the above table if you have covered this topic with this RI by holding discussions with several people, or if you have delegated some discussions; to show the full authorship and duration.