Sharing data – Where and how?
Planning data sharing
Deciding if the data can be shared and how requires planning. Researchers have a key role to play in deciding what data can be shared, but it is important to note that there are also other stakeholders (e.g. data sharing must be agreed within the research group, research participants must be informed, and organisational and funder requirements must be checked) involved in making this decision.
This horror story shows what happens when data is not opened properly (CC BY NYU Health Sciences Library):
Always plan well and carry out data sharing carefully.
Here is a short checklist for data sharing:
- Data may only be published by permission from the data owner. Make written agreements on the data ownership issues with all parties, preferably before the data collection.
- Think carefully about which data you can make open. Consult with relevant stakeholders when making decisions. Find out if there are any ethical, juridical or contractual limitations for sharing the data.
- Inform research participants about data sharing. In most cases, only anonymous data can be openly shared. So, anonymise your data if needed.
- Update your data management plan, whenever needed, as your project progresses.
- Provide good documentation and metadata about the data. Aim at making your data as FAIR as possible. As a part of data documentation and description, provide a contextual framework of your data. Data without metadata is impossible to make sense of or reuse, thus it will likely not get the reuse or citations you might hope for.
- Choose a suitable repository or journal for your data and metadata. In Finland, in addition to sharing your data via a data repository or other service, make sure that the metadata of your data ends up in the national research dataset finder Etsin.
- Assign a digital persistent identifier (e.g. DOI) for your data and metadata. These are usually provided by repositories.
- License your data. Creative Commons licenses offer different license types that will meet most researchers’ needs.
- Data repositories often provide a recommendation for data citation. If there is no recommendation, provide a recommended way to cite your data. Notice data citation guidelines introduced in Reusing and citing research data.
- Remember to link all the outputs through your ORCID identifier.
Data sharing tools and practices
There are different ways as to how you can share your data, e.g.:
- in a discipline-/subject-specific data repository
- in a general data repository
- in an institutional data repository
- in a data journal
- as a supplement to a peer-reviewed article
Subject-specific data repositories
It is always a good choice to deposit your research data to a discipline- or subject-specific repository that is recognised in your field. Journals and funders may specify which data repositories they want researchers to use, so check the terms of your grant or publishing agreement/instructions for details.
Few examples of discipline-specific repositories:
- Pangaea, data repository especially for earth and environmental sciences.
- The European Bioinformatics Institute (EMBL-EBI), molecular data resources and bioinformatics services to the scientific community. Several data repositories and a wizard that helps to find the right archive to submit data.
- The Language Bank of Finland is a service for researchers using language resources. The Language Bank has a wide variety of text and speech corpora and tools for studying them. A majority of the Language Bank’s users are language researchers, but the service is equally well suited to other digital humanities research.
See also the list of research data services on UEF’s Open Science web pages.
You can use Registry of Research Data Repositories – re3data service to find more repositories. Browse, for example, by subject or country to find suitable data repositories.
FAIRsharing service can also be used for searching subject-specific repositories that have been recommended or mandated by funding bodies and publishers.
Generalist data repositories
You can use a generalist data repository such as Dryad, Zenodo, EUDAT or Figshare for your research data. This may be beneficial e.g. in the case of multidisciplinary research or if there aren’t recognised, high-quality data repositories in your discipline available yet. These generalist data repositories may also be recommended by your funder or journal.
Institutional data repositories
Many universities and other research organisations are developing their own research data or metadata repositories. It is worth noting that even if you have deposited your data with a subject-specific or general repository, you may also wish to add a metadata description including a link to the data to your institutional data repository as another way of making your data more visible and to support institutional research assessment.
At UEF, you cannot directly deposit your data to UEF eRepository. Instead, metadata of open research data are harvested to UEF eRepository from various repositories (Etsin, Zenodo, EUDAT and Dryad).
Data journals
Consider publishing your dataset in a peer-reviewed data journal. Data journals are publications whose primary purpose is to expose datasets. They enable you as an author to focus on the data itself, rather than producing an extensive analysis of the data which occurs in the traditional journal model. Typically, a publication in a data journal consists of an abstract, introduction, data description with methods and materials, and short conclusion on reuse opportunities.
Publishing in a data journal may be of interest to researchers and data producers for whom data is a primary research output. In some cases, the publication cycle may be quicker than that of traditional journals, and where there is a requirement to deposit data in an approved repository, long-term curation and access to the data is assured.
Examples of data journals:
- Scientific Data – published by Nature
- Geoscience Data Journal – published by Wiley
- Journal of Open Archaeology Data – published by Ubiquity
- Biodiversity Data Journal – published by Pensoft
- Earth System Science Data – published by Copernicus
Metadata sharing
In case you cannot publish your data (because of confidentiality, contracts or other issues limit it), you can usually publish its description, i.e. metadata, though. The metadata can almost always be shared, only in the case that the metadata also includes confidential information, it must be considered if it can be published and how.
In Finland, Etsin is a national service for metadata of research data. Etsin enables you to find research datasets from all fields of science. It contains information about the datasets and metadata in the national Finnish Fairdata services, the Language Bank of Finland, the Finnish Social Science Data archive and the Finnish Environmental Institute, but also metadata of datasets located in other repositories are recommended to be added.
In Etsin, the published metadata on the dataset is open to everyone. The data owner decides how the underlying research data can be accessed and by whom. Etsin works independently of actual data storage locations and contains no research datasets. Datasets can be described and metadata published through the Qvain service (Etsin -> Create/edit datasets). When producing metadata, it is useful to utilise general subject headings (In Finland, Finto service provides subject headings from different fields of science) for describing the research data with commonly agreed terms. The use of Etsin is recommended for all research projects in Finland. If you are outside of Finland, find out if there are similar metadata services available in your country.
Remember:
- There are many ways to share research data: subject-specific and generalist data repositories, institutional data repositories, data journals and supplements to a peer-reviewed article.
- Share at least the metadata of the datasets.
- Choose a service that meets your needs and provides a persistent identifier for your data.
- Consult with data support of your research organisation, if needed.
- Data that will be shared for reuse should be licensed. Licensing defines the data’s author and user rights.
(8/2024 KH)
Move to the next page “Reusing and citing research data“