Guidance for research data management at JYU

1. Plan

What is research data and why should I manage it?

Research data refers to

  • digital or analog basic data and materials for scientific research, 
  • further data refined from these, upon which the research findings and published research results are based, as well as  
  • code and software upon which the research results build.  

At the University, management, storage and reuse of research data are essential elements of research infrastructure. Domestic and international research funders as well as publishers not only appreciate but increasingly mandate openness and transparency of research data. Above all, good data management leverages the research process itself. With well planned data management, you

  • make your research more efficient
  • comply with funders' requirements
  • comply with data protection legislation and protect your study subjects
  • agree upon data ownership and rights as well as data sharing and preservation together with your partners
  • agree upon how, when and on what terms you open your data
  • ensure that others can cite your data, giving you due scientific credit
  • ensure that necessary resources and equipment are available to you throughout your project.

JYU provides research teams with an in-house research data infrastructure. It features tools for supporting data management with appropriate data security and storage capacity, as well as related advice and guidance. 

The FAIR principles for reusable data

In accordance with the international FAIR framework for maximum reusability for research data, JYU encourages data management that leads to Findable, Accessible, Interoperable and Reusable research data and metadata. Major funders such as the Academy of Finland also require that metadata and, if possible, data issued with their funding are FAIR. In order to ensure that your data and/or their metadata are FAIR is to follow these five steps:

  • Archive your data in an established digital repository at the end of the project 
  • Choose a repository that provides your data a persistent identifier (PID), such as DOI or URN
  • Store your data in a open file format such as Rich Text Fotmat (.rtf) or .csv; these are more interoperable and less subject to loss and obsolescence than proprietary formats 
  • Create descriptive metadata for the data (see Documentation and metadata below)
  • License your data with a license that states clearly the conditions and restrictions for reuse (see ensuring further use of data below).

Data management plan

Research funders, e.g. the Academy of Finland, require applicants and/or grantees to provide a data management plan (DMP) as part of the research plan, stating how the research data will be obtained, used, stored and protected, and how their later use will be enabled for others.

As a rule, one should seek to publish the data for further use. However, publishing does not necessarily mean that the data could be used by anyone for any purpose whatsoever. It is therefore important that you acknowledge and document the ownership, control and terms of use for the data and include these in the descriptive metadata. The ownership and control of a dataset entail a right to decide on the purpose of its use, but also responsibility for proper management of the data. In many cases, the data themselves cannot be for a justified reason be made available, but even in these cases, their metadata can.

A data management plan should briefly describe:

  • What kind of data will be collected and how?
  • In what ways will the data be processed and analysed?
  • What kind of resulting data will be produced?
  • How the research data will be stored and how their possible later use is arranged?
  • Ownership and access rights for the data (including related written agreements).

You can use the DMPTuuli online tool for drafting your data management plan. DMPTuuli is designed for preparing tailored DMPs in accordance with the specific requirements of different funders. It also provides JYU instructions for preparing a DMP.

See Data management plan: Video guidance (CSC IT Centre for Science)

Know your data

When starting on drafting your DMP, briefly describe and categorise your data (e.g., pre-existing statistical or archive data; raw data that you collect; processed analysis data). You can use a table or a listing. Name the different data types so that you can reference them later on in your DMP.

Examples:

Data type Source Personal / sensitive data File format (recommended open formats:) Estimates size
Analysed DNA sample Processed from DNA sample No .xlsx, .csv 2 Gb
Statistical analysis X Pre-existing from FSD No, anonymised SPSS (.por, .sav)  
Questionnaire Collected from study subjects Yes, identifiable and health information .csv 5 Mt
Interview recording on video Collected Yes, identified personal information .avi, .mp4  
Interview transcript Processed No .csv, .txt, .xlsx >10 Mt
Image Collected No .tif, .jpeg, .gif, .raw  

Source: Fuchs, S. 2020. RDM : Research Data Management Basics, Meilahti. Helsingin yliopisto. CC BY 4.0. [Retrieved on 25.2.2021.]

Contracts,  agreements and licensing 

To secure your legal protection and in view of the further use of data compiled in your research project, make a written agreement upon ownership of and access rights to the data within your group and with your partners as early as possible. The project's PI is responsible for making sure that all project partners sign a Transfer-of-rights agreement about transferring ownership of the data to the University before signing the project agreement, and at the latest upon starting the project. There are agreement templates available for this purpose. Transfer-of-rights agreements are always recommended, but must be made at least in the following cases:

  • projects funded by Business Finland 
  • EU research programmes
  • commissioned research
  • as requested by the funder.


Agreeing upon licenses for the future published data is important at this phase. Licensing means that you clearly define the reuse terms and possible restrictions to future reuse of the data. This way, you are in control of who will have rights to reuse the data, and how. The JYU policy is to use machine-readable licenses that follow international standards, preferably Creative Commons. Using them secures the maximum reusability for the data.  Licensing is necessary for publishing data in the future; unlicensed datasets are unsafe to reuse. 

Consider these when making  agreements 

  • What parts of data are meant to be made available for reuse?
  • If one or several researcher brings ready pre-existing data (such as statistical or register data) to the project, will these be included in the future published data? Are reuse rights clear?
  • When will the dataset(s) be published?
  • To what purpose will the data be made available (free reuse with a Creative Commons lisence; restricted use for research/teaching/studying purposes)?
  • Who has the right to make the  publishing/archiving contract?
  • If reuse will be restricted, who is authorised to grant the reuse right?


Ethical issues and handling of personal data

Ethical issues, and in particular everyone's legal right for privacy, may influence how you collect, store and handle research data, who can use the data, for which purpose, and archiving the data. Research ethics and, in particular, the legal right of subjects to privacy may affect how you collect, store and process research data, who may use the data and for what purpose, and how the data can be archived.

If your research involves the processing of any information that can be directly or indirectly linked to an individual, your research will be subject to data protection legislation. Prior to the start of the study, clearly inform the subjects how and by whom their personal data will be processed and managed throughout the life cycle of the research project. JYU's Privacy Policy for Investigators advises you on how to process personal data securely and in accordance with legal requirements at different stages of the investigation.

Special categories of personal data and sensitive information


Special categories of personal data include a person's ethnic origin, political opinion, religious beliefs and philosophical worldview, trade union membership, genetic data, personally identifiable biometric data, health information, and sexual behavior or orientation. Sensitive information also includes personal identification number, bank account information and criminal record information.

If you process personal data of a special category, special provisions apply to your research regarding the definition of the exception for the processing of personal data and the storage and processing of material.


Impact assessment

Always conduct at least a concise informal risk assessment for your research on personal information. Instructions on risk assessment can be found here. If the risk is estimated high, the study is made subject to the  the Data Protection Impact Assessment (DPIA) in accordance with the EU's general data protection regulation. 

A particularly high risk is considered

  • a large number of persons whose data are processed
  • a large amount of information about a person
  • sensitive information
  • information on vulnerable study subjects (e.g. children)
  • use of data for automated decision making
  • systematic monitoring.

See the JYU data privacy guide for researchers for detailed instructions.

Ethical Review in the Human Sciences 


You may need a prior evaluation of your research from the University's Human Sciences Ethics Committee or the Ethics Committee of the Hospital District of Central Finland if your research set-up meets the specific criteria defined by the Finnish National Board on Research Ethics:

1) Participation in the study deviates from the principle of informed consent,
2) the investigation addresses the physical integrity of the subjects,
3) the study is aimed at persons under the age of 15 without the separate consent or information of the guardian, on the basis of which the guardian would have the opportunity to prohibit the child from participating in the study,
4) the subjects are presented with exceptionally strong stimuli,
5) there is a risk in the research to cause mental harm to the subjects or their relatives beyond the limits of normal everyday life, or
6) the conduct of the investigation may pose a safety threat to the subjects or the investigator or their relatives.

If necessary, contact the University’s Human Sciences Ethics Committee well in advance of starting your research. Review can no longer be requested after the start of the investigation. 

A checklist for handling personal data

  • Justify that you have the right under the GDPR to collect, process and store personal information: check the criteria for processing personal data.
  • Make sure that an ethical review is done, if necessary, before starting the study.
  • Indicate which parties process the data and who or who are the data controllers . NOTE! Also consider the service providers you use here. If you conduct a survey using, for example, the Webropol service, it technically has the opportunity to see the content.
  • Describe what personal information you need, and for what purpose.
  • Determine the legal basis for the processing: is it 1) consent or 2) public interest?
  • Assess the risks that the processing of personal data may pose to the subjects.
  • Find out if the project requires an impact assessment (DPIA) and carry it out, if needed.
  • Explain how you protect the data processed and the privacy of the subjects and, if necessary, pseudonymise or anonymise them.
  • How do you dispose of personal information that has become redundant so that it is no longer recoverable?
  • Remember that personal data may only be processed as informed by the data subject in the privacy statement before starting the investigation. If the processing of the data reveals the need to deviate from the one specified in the data protection notice, inform the subjects immediately and update the data protection notice and other necessary documents.

Make sure that the processing of personal data is described in a consistent manner in your data management plan and in the privacy notice.


2. Collect, create and process

A large part of research data are in digital form to start with. Research data typically consist of

  • questionnaires and interviews
  • different types of measurement and observation 
  • other types of video, image, audio, and text materials.

Basic datasets for research are collected e.g. in the form of questionnaires, interviews, video recordings, as well as with various devices and sensors. Different measurement and data collection methods yield different metadata and file formats. This poses challenges especially when investigating the same phenomenon with different observation methods and datasets. The processing of data (e.g. regarding data descriptions, data protection and data security) can be streamlined and automated by software designed for the purpose.

At the data analysis stage, the actual results of an empirical study are derived from the raw data. When raw data is processed, aggregated and analysed, it results in various further datasets for elaboration and reporting. To make the study progress smoothly, it is important that you handle the generated datasets in a controlled fashion. Establish uniform data handling procedures among your group that everyone follows, and document them. There are different data management software applications available for different types of datasets. 

Documentation during the project 

Essential questions: How do I document my data so that they can be found, accessed, and used by me and others tomorrow, in a week, and years from now? If a completely unknown researcher found my data, could they understand what they are about? What do I need to do to understand and be able to use my data?

Documentation refers to the creation of descriptive information that clarifies the context and methods of capturing and processing the data, as well as the structure of the data, and the chosen file system (e.g. subfolders, naming, etc.) It can mean, for example, a description of variables and key vocabulary as well as units of measurement, or an inventory of research interviews and related basic information. The documentation may also contain information on, for example, the version of a particular dataset. Technical metadata produced by technical equipment (e.g. calibrations, etc.) are also essential documentation.

Best practices
  • Find out what kind of software is available for you for data collection, processing and analysis. University's Digital services provide software upon request for data processing.
  • Using discipline-specific metadata standards for documentation is a good way to ensure the future findability and reusability of the data. Metadata standards help you describe your data with standard attributes which make it easier for other researchers in the same field to make sense of them. The description can be saved in e.g. a Readme file in .txt or XML format and stored alongside the data.
  • Store the descriptive information in separate files (eg Readme files, inventory excels) alongside the datasets in a subfolder you name /DOCUMENTATION. This way, the documentation files can be found also by someone who does not know the structure of the data in more detail.
  • Plan what kind of documentation you produce and where to find it if you don't use the /DOCUMENTATION subfolder. If possible, use metadata standards for your industry in the documentation.
  • Agree within your research team members well in advance on a uniform way to arrange files in folders and subfolders. A logical folder structure streamlines work and reduces the risk of loss.
  • Use open file formats instead of commercial formats. Open, standard file formats are the best guarantee of data availability after several years. Examples of recommended and acceptable formats can be found in the UK Data Service format comparison table.
  • Keep the documentation up to date as your data collection and processing proceeds. When planned well ahead and done at the same time as the you work with your data, it is a small effort, but at a later stage it becomes practically impossible.
  • Describe your documentation practices in your DMP.

Creating and maintaining metadata  

Metadata refers to general descriptive information about research data (e.g. owner, authors, distributor, name, short description, location…) Up-to-date metadata is the key to finding and accessing your data. Basic project-level metadata is created and maintained in the Research data section of Converis, the JYU research information system. For more information, see step-by-step guidelines in JYU Intranet Uno. When kept up-to-date during the course of the project, metadata clarify for both you and secondary users what the work is about. 

At JYU, metadata is maintained in the Research Data System section of Converis. A metadata entry should be created for each dataset. When describing your data, divide them into such entities that you can describe them unambiguously. The data described in more detail can be bundled in Converis into larger entities under a larger “parent dataset”. See the detailed instructions for managing your metadata in Converis.

Quality assurance

Ensuring the integrity and quality of the data as they are collected, migrated and transferred is an important part of data management. Careful documentation of the procedures of data collection is the primary measure to ensure the integrity and quality of the data. Depending on the type of material, equipment and methods, integrity and quality can be ensured, for example,

  • by calibrating measuring instruments to monitor the accuracy and scale of detection
  • by reviewing the spelled interview material with an external expert
  • using industry standardized methods, hardware, and software.

Pseudonymisation and anonymisation

In order to secure the safest possible handling of personal data and to follow what you have promised to your study subjects in the privacy notice, pseudonymisation and/or anonymisation of personal identifiers in the data may become topical. Anonymised data can be safely shared during the project and published at any point. However, anonymisation procedures have to be planned well in advance, and they require time and some effort. 

Pseudonymous data are still personal data, but they can no longer be identified without combining them with other information. Replacing real names with fake names and identifiers with codes are typical pseudonymisation techniques. As long as the key to the coded identifiers is stored separately from the data in a secure storage location and only designated people have access to them, pseudonymisation may suffice as a data security method during the project. However, as the project ends, identifiers should be erased or, if storing them is necessary for e.g. enabling contact with the study subjects, the research group members should set a future date for re-evaluating the need for retaining the identifers. This should be documented in the data management plan.  

There are various anonymisation methods for qualitative and quantitative data. For detailed instructions for both, see the Finnish Social Science Data Archive Guidelines. In case you plan to deposit your data in the Finnish Social Science Data Archive, the FSD experts will help you in anonymisation at the data deposition phase. See the FSD guidance for researchers


3. Store and share

As the research process goes on, there is a need to transfer datasets for storage. At the same time, it is often also necessary to share some data and related access rights with other researchers over a data network. Also at this stage it is essential to take proper care of data protection and data security issues for the research data. Sensitive data must not be transferred online without adequate safety measures. Bear in mind that keeping datasets on the hard discs of workstations, memory sticks and USB drives is in principle a risk in terms of data protection and data security. 

When datasets are transferred for longer-term storage, for possible further use, or to be published, the data should be carefully checked and anonymised. When due attention is paid to data management from the outset, the data can be prepared fairly easily for long-term storing, further use, and publishing. Comprehensive instructions for the handling of datasets are available in the data management guidelines of the Finnish Social Science Data Archive. 

Tools for data storage and sharing 

JYU offers various tools that can make data management easier, covering the phases from data collection up to storing, sharing and further use:

  • Workstations and related software (e.g. SPSS and Atlas.ti)
  • Special hardware and software for various research teams and laboratories
  • Nextcloud cloud storage and sharing service for non-sensitive data 
  • The CollabRoom cloud service (instructions currently only in Finnsh) for secure processing and transfer of sensitive datasets
  • Researchvideo service intended for storing of non-sensitive audiovisual materials
  • JYX publication archive for publishing small datasets
  • The University’s quota in CSC’s IDA storage service.

Storing and sharing personal data

If your data contains personal or otherwise sensitive information, store it in the original storage device whenever possible. For maximum data protection, personal data should not be transferred outside the original storage location such if it can be avoided, e.g., to a separate analysis excel workbook. When kept and processed in the original location, it is easier to keep automated log to monitor who has had access to the data. 

When transferring personal data, make sure that you know exactly who receives it at the other end. Ensure your legal right to transfer personal data by informing your study subjects about who handles their personal information, why, and how, at the beginning of your project using the data privacy notice. If you cannot use the University's Nextcloud or CollabRoom tools for sharing and have to use email, security email or encryption of attached files are necessary. 

Sometimes data must be transferred outside the EU and the European Economic Area. This can be the case if e.g. appropriate analysis equipment only exists in some particular location. Special legal obligations must be taken into account for personal information transferred outside the EU-EEA area. If this is topical to your study, consult the University's Data Protection Officer


4. Publish, archive and dispose of data at the end of the project 

Publishing data at the University of Jyväskylä

The University of Jyväskylä strives for the widest possible openness of research data and their metadata, pursuing the principle "As open as possible, as closed as necessary". Accordingly, the researchers as the best experts of their data, evaluate which parts of the datasets can be published. By default, data should be published, and if they cannot be made openly available, the reasons are justified in the data management plan. The university recognises that depending on the nature of the data, there can be different degrees of openness:

When planning your research, consider what selections of the data you can publish, what steps it takes to publish them, and what you will do with the rest of the data (archiving, disposal). This way, your plan covers the entire data lifecycle, and you will save a lot of time and effort at the end of your project.

Regardless of which data repository you choose, it is crucial that relevant information about the dataset and its storage site remain also at JYU. Metadata should be published for all datasets, including those that remain closed and/or are disposed of. This is done in the Research Data section of the Research Information System Converis (see the step-by step instructions). If you publish data in the University's digital repository JYX, the Open Science Center will take care of recording the availability information for you.  

Benefits for publishing my data?

Opening your own data makes the world a better place, but it also directly benefits your research:

  • You will find your own datasets faster and more reliably in the future for re-use - no matter how many years, new computers, new working institutions, or even new continents, you've gone through in between.
  • Was it ever difficult to remember what this or that column in an old dataset of your own actually means? Or decide who deserves a co-authorship in new research using old data? Your own research is easier and meets higher quality standards, when its metadata is archived from the beginning and published.
  • Your research gets cited through datasets too, and you are easier to find. This means new contacts in your field of research, more name recognition, and more opportunities to do interesting and rewarding science.
  • Many funders and research institutions recognise published datasets as a significant output in evaluation of researcher merit. The weight of data as a contribution to science when awarding grants and positions is steadily increasing. Be in the vanguard and open everything you can - you will gain a competitive edge in funding applications and in other situations where your research merit is judged. For example when you apply for tenure in 2020s!
  • Most importantly: the chance of your work to impact the world grows, a lot.

Preparing for data publishing 

The prerequisite for opening data is the timeliness of the metadata and the documentation of the data. A good way to ensure this from the beginning of the research project is to keep the documentation of the methods, structure, content and other information relevant to the research in a subfolder called /DOCUMENTATION. General metadata is maintained in the Research data section of the University's Converis research information system

Creating a metadata record does not mean that you need to publish it yet. You can update it as your project proceeds and keep it closed until you are ready to publish it at the end of the project. This enables early support from the University of Jyväskylä in your data management support needs. Moreover, you can easily request for help from the University's data experts via Converis with a few clicks. Finally, via Converis, you can request the publication of ready metadata or even your dataset itself in the University's digital repository JYX.

Criteria for choosing a repository

  • Widely used by researchers in your field
  • Gives the metadata (and, if applicable, the underlying data) a permanent identifier, such as a DOI or URN
  • Publishes machine-readable metadata and uses a known metadata standard
  • Has a certificate of operational reliability, such as the Core Trust Seal and the ISO 16363 standard
  • Allows you to choose the terms of use under which the material can be further used, and states them clearly as part of the metadata.

Some recommended repositories

Best practices 

• Publish your data primarily in a discipline or research-specific digital repository. In a sector archive, the data will most likely end up being found by researchers in your field.
The Re3data portal is an excellent site to search for a suitable repository and to browse repositories in your field.
• If a suitable field-specific repository is not available, publish your data in the university’s JYX repository.
• Use generalist data repositories such as Zenodo or figshare only as a last resort. The data deposited in them is heterogeneous, which lessens the discoverability of the data and makes it difficult to evaluate the findings.
• Remember that publishing data on you own or your project's website does not meet the requirements of the funders or the university's expectations regarding the discoverability and accessibility of the material.

Finnish Social Science Data Archive (FSD)

The FSD focuses on acquiring social science data. Under certain circumstances, data from other relevant fields (Arts and Humanities, Education, and health sciences) can also be archived. Datasets deposited at the archive must meet certain technical and legal requirements. Before dissemination, archived datasets are processed and documented.

FSD promotes open access to research data as well as transparency, accumulation and efficient reuse of scientific research. FSD also responsibly implements the FAIR data principles, which aim at making data and services Findable, Accessible, Interoperable and Re-usable.

The archive is a national resource centre funded by the Ministry of Education and Culture and the University of Tampere. In addition to archiving and dissemination of data, key services include data-related information services and support for research data management. The archive operates as a separate unit of the University of Tampere.

tietoarkisto_merkki_colour_325x338.png

The Lan­gua­ge Bank of Fin­land

The Language Bank of Finland is a service for researchers using language resources. The Language Bank has a wide variety of text and speech corpora and tools for studying them. The corpora can be analyzed and processed with the Language Bank’s tools or downloaded.

Many corpora are publicly accessible, some require logging in. The rights to use restricted resources can be applied for electronically. Using the Language Bank is free for researchers and students.

If you are new to the Language Bank, take a look at the Language Bank introduction.

Kielipankki_Kielipankki_CS6-e1450869278629.jpg


JYX

JYX is JYU's repository for publications and research data. It gives data sets permanent identifiers (DOI, URN). Metadata is sent to national METAX-catalogue that ensures that metadata and datasets can be found using national ETSIN service.

Publishing dataset in JYX is simple. Just create metadata for the dataset in Converis current research information system and make a request in Converis to publish (meta)data.

JYX-huoltokatko ke 6.11.2019 klo 15:30 — Digipalvelut

Archiving and disposal of data

According to the general guidelines of the archiving plan of the University of Jyväskylä, retention period of research data is generally 1-5 years after the publication of the research. If the study has a funder, also check what has been agreed with the funder for the retention period. Start with checking at your faculty if there is an internal archival/data retention period in use within your discipline,  

Erasing and disposal of sensitive data needs planning ahead, as well. Deleting files using operating system tools, or even reformatting a hard drive, will not irretrievably destroy the data. It is important to permanently destroy any data that includes personal, confidential or sensitive data after storage is no longer necessary. Choose the suitable method of disposal according to the data format: 

  • Paper data is disposed of in the grey locked data security boxes at the university campus.
  • Electronic files can be overwritten with a overwriting software. Eraser and WipeFile are examples of free, open source erasing programmes. The University's Data Security Officer helps you in furthwr questions about secure data disposal measures at the HelpJYU portal.

Long-term preservation

By long-term preservation is meant preservation of specially valuable datasets of over 25 years. The national CSC service Fairdata-PAS preserves these datasets. According to its data policy, JYU coordinates the preservation of nationally remarkable datasets in Fairdata-PAS. For more information, see below. 


The Open Science Centre helps you with issues related to data management and publishing. Please contact us at researchsupport-osc@jyu.fi.

 

Source selectively used to update the instructions: Fuchs, S. Research Data Management Basics, Meilahti, 31.3.2020. University of Helsinki Data Support. Creative Commons CC BY 4.0.