Plan

Why research data management - How do I benefit?

Research data refers to

  • digital or analog basic data and materials for scientific research,
  • further data refined from these, upon which the research findings and published research results are based, as well as
  • code and software upon which the research results build.

At the University, management, storage and reuse of research data are essential elements of research infrastructure. Domestic and international research funders as well as publishers not only appreciate but increasingly mandate openness and transparency of research data. Above all, good data management leverages the research process itself. With well planned data management, you

  • make your research more efficient
  • comply with funders' requirements
  • comply with data protection legislation and protect your study subjects
  • agree upon data ownership and rights as well as data sharing and preservation together with your partners
  • agree upon how, when and on what terms you open your data
  • ensure that others can cite your data, giving you due scientific credit
  • ensure that necessary resources and equipment are available to you throughout your project.

JYU provides research teams with an in-house research data infrastructure. It features tools for supporting data management with appropriate data security and storage capacity, as well as related advice and guidance.

The FAIR principles for reusable data

In accordance with the international FAIR framework for maximum reusability for research data, JYU encourages data management that leads to Findable, Accessible, Interoperable and Reusable research data and metadata. Major funders such as the Academy of Finland also require that metadata and, if possible, data issued with their funding are FAIR. In order to ensure that your data and/or their metadata are FAIR is to follow these five steps:

  • Archive your data in an established digital repository at the end of the project
  • Choose a repository that provides your data a persistent identifier (PID), such as DOI or URN
  • Store your data in a open file format such as Rich Text Fotmat (.rtf) or .csv; these are more interoperable and less subject to loss and obsolescence than proprietary formats
  • Create descriptive metadata for the data (see Documentation and metadata below)
  • License your data with a license that states clearly the conditions and restrictions for reuse (see ensuring further use of data below).

Data management plan

Research funders, e.g. the Academy of Finland, require applicants and/or grantees to provide a data management plan (DMP) as part of the research plan, stating how the research data will be obtained, used, stored and protected, and how their later use will be enabled for others.

As a rule, one should seek to publish the data for further use. However, publishing does not necessarily mean that the data could be used by anyone for any purpose whatsoever. It is therefore important that you acknowledge and document the ownership, control and terms of use for the data and include these in the descriptive metadata. The ownership and control of a dataset entail a right to decide on the purpose of its use, but also responsibility for proper management of the data. In many cases, the data themselves cannot be for a justified reason be made available, but even in these cases, their metadata can.

A data management plan should typically describe:

  • What kind of data will be collected and how?
  • In what ways will the data be processed and analysed?
  • What kind of resulting data will be produced?
  • How the research data will be stored and how their possible later use is arranged?
  • Ownership and access rights for the data (including related written agreements).

You can use the DMPTuuli online tool for drafting your data management plan. DMPTuuli is designed for preparing tailored DMPs in accordance with the specific requirements of different funders. It also includes JYU instructions for preparing a DMP.

Know your data

When starting on drafting your DMP, briefly describe and categorise your data (e.g., pre-existing statistical or archive data; raw data that you collect; processed analysis data). You can use a table or a listing. Name the different data types so that you can reference them later on in your DMP.

Examples:

Data type Source Personal / sensitive data File format (recommended open formats:) Estimates size
Analysed DNA sample Processed from DNA sample No .xlsx, .csv 2 Gb
Statistical analysis X Pre-existing from FSD No, anonymised SPSS (.por, .sav)  
Questionnaire Collected from study subjects Yes, identifiable and health information .csv 5 Mt
Interview recording on video Collected Yes, identified personal information .avi, .mp4  
Interview transcript Processed No .csv, .txt, .xlsx >10 Mt
Image Collected No .tif, .jpeg, .gif, .raw  
Administrative documents Permissions collected from study subjects Yes .docx  

Source: Fuchs, S. 2020. RDM : Research Data Management Basics, Meilahti. Helsingin yliopisto. CC BY 4.0. [Retrieved on 25.2.2021.]

Contracts, agreements and licensing

To secure your legal protection and in view of the further use of data compiled in your research project, make a written agreement upon ownership of and access rights to the data within your group and with your partners as early as possible. The project's PI is responsible for making sure that all project partners sign a Transfer-of-rights agreement about transferring ownership of the data to the University before signing the project agreement, and at the latest upon starting the project. There are agreement templates available for this purpose. Transfer-of-rights agreements are always recommended, but must be made at least in the following cases:

  • projects funded by Business Finland
  • EU research programmes
  • commissioned research
  • as requested by the funder.


Agreeing upon licenses for the future published data is important at this phase. Licensing means that you clearly define the reuse terms and possible restrictions to future reuse of the data. This way, you are in control of who will have rights to reuse the data, and how. The JYU policy is to use machine-readable licenses that follow international standards, preferably Creative Commons. Using them secures the maximum reusability for the data. Licensing is necessary for publishing data in the future; unlicensed datasets are unsafe to reuse.

Consider these when making agreements

  • What parts of data are meant to be made available for reuse?
  • If one or several researcher brings ready pre-existing data (such as statistical or register data) to the project, will these be included in the future published data? Are reuse rights clear?
  • When will the dataset(s) be published?
  • To what purpose will the data be made available (free reuse with a Creative Commons lisence; restricted use for research/teaching/studying purposes)?
  • Who has the right to make the publishing/archiving contract?
  • If reuse will be restricted, who is authorised to grant the reuse right?

Ethical issues and handling of personal data

Ethical issues, and in particular everyone's legal right for privacy, may influence how you collect, store and handle research data, who can use the data, for which purpose, and archiving the data. Research ethics and, in particular, the legal right of subjects to privacy may affect how you collect, store and process research data, who may use the data and for what purpose, and how the data can be archived.

If your research involves the processing of any information that can be directly or indirectly linked to an individual, your research will be subject to data protection legislation. Prior to the start of the study, clearly inform the subjects how and by whom their personal data will be processed and managed throughout the life cycle of the research project. JYU's Privacy Policy for Investigators advises you on how to process personal data securely and in accordance with legal requirements at different stages of the investigation.

Special categories of personal data and sensitive information


Special categories of personal data include a person's ethnic origin, political opinion, religious beliefs and philosophical worldview, trade union membership, genetic data, personally identifiable biometric data, health information, and sexual behavior or orientation. Sensitive information also includes personal identification number, bank account information and criminal record information.

If you process personal data of a special category, special provisions apply to your research regarding the definition of the exception for the processing of personal data and the storage and processing of material.


Impact assessment

Always conduct at least a concise informal risk assessment for your research on personal information. Instructions on risk assessment can be found here. If the risk is estimated high, the study is made subject to the the Data Protection Impact Assessment (DPIA) in accordance with the EU's general data protection regulation.

A particularly high risk is considered

  • a large number of persons whose data are processed
  • a large amount of information about a person
  • sensitive information
  • information on vulnerable study subjects (e.g. children)
  • use of data for automated decision making
  • systematic monitoring.

See the JYU data privacy guide for researchers for detailed instructions.

Ethical Review in the Human Sciences


You may need a prior evaluation of your research from the University's Human Sciences Ethics Committee or the Ethics Committee of the Hospital District of Central Finland if your research set-up meets the specific criteria defined by the Finnish National Board on Research Ethics:

1) Participation in the study deviates from the principle of informed consent,
2) the investigation addresses the physical integrity of the subjects,
3) the study is aimed at persons under the age of 15 without the separate consent or information of the guardian, on the basis of which the guardian would have the opportunity to prohibit the child from participating in the study,
4) the subjects are presented with exceptionally strong stimuli,
5) there is a risk in the research to cause mental harm to the subjects or their relatives beyond the limits of normal everyday life, or
6) the conduct of the investigation may pose a safety threat to the subjects or the investigator or their relatives.

If necessary, contact the University’s Human Sciences Ethics Committee well in advance of starting your research. Review can no longer be requested after the start of the investigation.

A checklist for handling personal data

  • Justify that you have the right under the GDPR to collect, process and store personal information: check the criteria for processing personal data.
  • Make sure that an ethical review is done, if necessary, before starting the study.
  • Indicate which parties process the data and who or who are the data controllers . NOTE! Also consider the service providers you use here. If you conduct a survey using, for example, the Webropol service, it technically has the opportunity to see the content.
  • Describe what personal information you need, and for what purpose.
  • Determine the legal basis for the processing: is it 1) consent or 2) public interest?
  • Assess the risks that the processing of personal data may pose to the subjects.
  • Find out if the project requires an impact assessment (DPIA) and carry it out, if needed.
  • Explain how you protect the data processed and the privacy of the subjects and, if necessary, pseudonymise or anonymise them.
  • How do you dispose of personal information that has become redundant so that it is no longer recoverable?
  • Remember that personal data may only be processed as informed by the data subject in the privacy statement before starting the investigation. If the processing of the data reveals the need to deviate from the one specified in the data protection notice, inform the subjects immediately and update the data protection notice and other necessary documents.

Make sure that the processing of personal data is described in a consistent manner in your data management plan and in the privacy notice.