Guidance for research data management at JYU

1. Plan

What is research data and why should I manage it?

Research data refers to

  • digital or analog basic data and materials for scientific research, 
  • further data refined from these, upon which the research findings and published research results are based, as well as  
  • code and software upon which the research results build.  

At the University, management, storage and reuse of research data are essential elements of research infrastructure. Domestic and international research funders as well as publishers not only appreciate but increasingly mandate openness and transparency of research data. Above all, good data management leverages the research process itself. With well planned data management, you

  • make your research more efficient
  • comply with funders' requirements
  • comply with data protection legislation and protect your study subjects
  • agree upon data ownership and rights as well as data sharing and preservation together with your partners
  • agree upon how, when and on what terms you open your data
  • ensure that others can cite your data, giving you due scientific credit
  • ensure that necessary resources and equipment are available to you throughout your project.

JYU provides research teams with an in-house research data infrastructure. It features tools for supporting data management with appropriate data security and storage capacity, as well as related advice and guidance. 

The FAIR principles for reusable data

In accordance with the international FAIR framework for maximum reusability for research data, JYU encourages data management that leads to Findable, Accessible, Interoperable and Reusable research data and metadata. Major funders such as the Academy of Finland also require that metadata and, if possible, data issued with their funding are FAIR. In order to ensure that your data and/or their metadata are FAIR is to follow these five steps:

  • Archive your data in an established digital repository at the end of the project 
  • Choose a repository that provides your data a persistent identifier (PID), such as DOI or URN
  • Store your data in a open file format such as Rich Text Fotmat (.rtf) or .csv; these are more interoperable and less subject to loss and obsolescence than proprietary formats 
  • Create descriptive metadata for the data (see Documentation and metadata below)
  • License your data with a license that states clearly the conditions and restrictions for reuse (see ensuring further use of data below).

Data management plan (DMP)

Research funders, e.g. the Academy of Finland, require applicants and/or grantees to provide a data management plan (DMP) as part of the research plan, stating how the research data will be obtained, used, stored and protected, and how their later use will be enabled for others.

As a rule, seek to publish the data for further use. However, publishing does not necessarily mean that the data could be used by anyone for any purpose. It is therefore important that you acknowledge and document the ownership, control and terms of use for the data and include these in the descriptive metadata. The ownership and control of a dataset entail a right to decide on the purpose of its use, but also responsibility for proper management of the data. In many cases, the data themselves cannot be for a justified reason be made available, but even in these cases, their descriptive metadata can.

A data management plan should typically describe:

1. What kind of data you will be reusing, collecting, and processing, and how?
2. What kind of ethical and legal considerations (data privacy, other ethical issues, ownership, usage rights) apply?
3. How will the data be documented, and what kind of descriptive metadata will you provide?
4. How will the data be stored and backed up?
5. At the end of the project, what parts of the data will be archived, published, and/or disposed of?
6. What kind of roles and responsibilities apply to the management of the data? What is the planned budget?

You can draft your DMP by copying the funder's DMP template straight into Word or other word processor. NB! You can also use the DMPTuuli online template to write the plan, but please note that the institutional DMP guidelines are no longer available in DMPTuuli since 1 January, 2022. Funders do not require the use of DMPTuuli. Check and download the JYU DMP instructions from below:

Tools and guidance

JYU Guidelines to How to write your DMP
Academy of Finland's DMP tips
General Finnish DMP template and guidance (.pdf)
JYU's organisational instructions and model clauses for General Finnish DMP Template (.pdf)

Know your data

When starting on drafting your DMP, briefly describe and categorise your data (e.g., pre-existing statistical or archive data; raw data that you collect; processed analysis data). You can use a table or a listing. Name the different data types so that you can reference them later on in your DMP.


Data type Source Personal / sensitive data File format (recommended open formats:) Estimates size
Analysed DNA sample Processed from DNA sample No .xlsx, .csv 2 Gb
Statistical analysis X Pre-existing from FSD No, anonymised SPSS (.por, .sav)  
Questionnaire Collected from study subjects Yes, identifiable and health information .csv 5 Mt
Interview recording on video Collected Yes, identified personal information .avi, .mp4  
Interview transcript Processed No .csv, .txt, .xlsx >10 Mt
Image Collected No .tif, .jpeg, .gif, .raw  

Source: Fuchs, S. 2020. RDM : Research Data Management Basics, Meilahti. Helsingin yliopisto. CC BY 4.0. [Retrieved on 25.2.2021.]

Contracts,  agreements, licensing 

To secure your legal protection and in view of the further use of data compiled in your research project, make a written agreement upon ownership of and access rights to the data within your group and with your partners as early as possible. The project's PI is responsible for making sure that all project partners sign a Transfer-of-rights agreement about transferring ownership of the data to the University before signing the project agreement, and at the latest upon starting the project. There are agreement templates available for this purpose. Transfer-of-rights agreements are always recommended, but must be made at least in the following cases:

  • projects funded by Business Finland 
  • EU research programmes
  • commissioned research
  • as requested by the funder.

Agreeing upon licenses for the future published data is important at this phase. Licensing means that you clearly define the reuse terms and possible restrictions to future reuse of the data. This way, you are in control of who will have rights to reuse the data, and how. The JYU policy is to use machine-readable licenses that follow international standards, preferably Creative Commons. Using them secures the maximum reusability for the data.  Licensing is necessary for publishing data in the future; unlicensed datasets are unsafe to reuse. 

Consider these when making  agreements 

  • What parts of data are meant to be made available for reuse?
  • If one or several researcher brings ready pre-existing data (such as statistical or register data) to the project, will these be included in the future published data? Are reuse rights clear?
  • When will the dataset(s) be published?
  • To what purpose will the data be made available (free reuse with a Creative Commons lisence; restricted use for research/teaching/studying purposes)?
  • Who has the right to make the  publishing/archiving contract?
  • If reuse will be restricted, who is authorised to grant the reuse right?

Processing of personal data, ethical issues

Research ethics and the legal right of your study subjects to their personal information affect the ways in which you collect, store, and process research data, who is allowed to use the data and for what purpose, and how the data can be archived. When planning the collection of personal information, check the JYU data protection trainings and resources for researchers in good time.

Do I process personal information in my research?

If your research involves the processing of any information that can be directly or indirectly linked to a natural, living person, you are processing personal information. In this case, data protection law applies to your research. Personal data is collected either directly from the subjects (interviews, surveys, measurements, observations, etc.) or indirectly from registers and archives.

Direct identifiers are information from which a person is immediately identifiable, e.g. full name, ID, personal e-mail address, face image, voice recording, fingerprint, or brain image, for example. Indirect identifiers are information that is not directly identifiable by an individual, but which, when combined with other information available about the person, may lead to identification. This information includes e.g. car registration number, grandparents' names, domicile, marital status, occupation, ethnic background and date of birth. For more information, see here.

Special categories of personal data and sensitive information

Special categories of personal data include a person's ethnic origin, political opinion, religious beliefs and philosophical worldview, trade union membership, genetic data, personally identifiable biometric data, health information, and sexual behavior or orientation. Sensitive information also includes personal identification number, bank account information and criminal record information.

If you process personal data of a special category, special provisions apply to your research regarding the definition of the exception for the processing of personal data and the storage and processing of data.


Researcher's privacy path

1. It all starts with planning. What personal information do you need to conduct your research? Is your study a one-time or follow-up study? When you provide a privacy statement required by the GDPR, you must be able to specify a clear life cycle for the processing of personal information, including the beginning and the end. Principle of minimization and limitation of the retention of personal data. Minimization means that only the amount of personal data necessary for the purpose defined in the study plan should be collected and that identifiers that become redundant should be removed as soon as possible. According to the principle of limitation, it is good to try to define a temporal end point for the storage of personal data. Note that the text of the privacy statement will legally bind you in the future.

2. Anonymisation, archiving, follow-up studies, reuse. The middle and end of the life cycle of personally identifiable information must be planned well in advance so that you can realistically describe it in the research notification and the privacy notice that you provide to your subjects.

  • At what point do you plan to pseudonymise the data collection and processing, and how do you ensure the security of the code key? Is anonymisation a viable option for you? Before collecting the information, familiarize yourself with what anonymisation in practice requires of you. Only make a decision to anonymise when you are absolutely sure you should take it.

  • If you do not anonymise the data to open them anonymously for re-use after the research, justify in the privacy notice the reason for retention of the identifiable data after the study, such as verification of the research results. If it is not possible for you to set an exact end date for the retention of personal data, record in the information you provide to the subjects that the retention of data for the purposes of the original study will be evaluated, for example, every year or two.

  • Archiving of identifiable data is possible under certain conditions. For example, the Finnish data archive for language data, the Language Bank of Finland, requires that a plan for archiving the data to the Language Bank be written in the information provided to the subjects. The same is required by the Finnish Social Science Data Archive. There are ready-made clauses in the template for the university's privacy statement for the different options.

Select the appropriate legal basis for the processing of personal data. The primary recommended basis is scientific research in the public interest. Public interest e.g. facilitates the possible future reuse of the data. Consent is recommended only in cases where public interest is not applicable for one reason or another. If you collect special categories of personal data and use consent as a legal basis, the consent must be explicit. See more information on the choice of legal basis. Record the selected criterion in the privacy notice.

4. Determine who will act as the controller or your personal data registry. If you are carrying out research on an externally funded project or working for a university, the university acts as the controller. Often, the university and the researchers both act as controllers. In consortium projects, there is usually joint controllership between the partner organisations. Record the controller in the privacy notice and in your DMP.

5. Prior to the start of the study, clearly inform your subjects how and by whom their personal data will be processed and managed during the study. JYU's Privacy Policy advises you on how to process personal information securely and lawfully at different stages of the investigation. If your research setting is of such a nature (e.g. extensive register data with incomplete contact information) that personal information is not possible, please consult the university's guidelines.

6. Always conduct at least a concise, free-form risk assessment for your research that contains personal information. For more information, see here. If the risk is estimated high, the study is made subject to the the Data Protection Impact Assessment (DPIA) in accordance with the EU's general data protection regulation.

A particularly high risk is considered

  • a large number of persons whose data are processed
  • a large amount of information about a person
  • sensitive information
  • information on vulnerable study subjects (e.g. children)
  • use of data for automated decision making
  • systematic monitoring.

See the JYU data privacy guide for researchers for detailed instructions.

7. When collecting personal data directly from your subjects, also ask them for consent to participate in the study. Consent to participate in the investigation is sought when the legal basis for the processing of personal data is public interest. Informed consent is a fundamental principle of research ethics, and deviating from it always requires an impact assessment.

8. When collecting personal information, strive to minimise the identifiable information, that is, avoid collecting personal information that is not necessary for your research question. Take care of the data security of the storage devices during the field study. Transfer the data to the University's Nextcloud, the S: drive project folder, or your personal U: drive folder as soon as possible after saving.

9. When processing personal information, take care of the data security of your procedures. Ensure that access to personally identifiable information is restricted to the persons or entities described in the DMP and the privacy notice. Document how you implement the security measures you have promised to subjects and how you control access to identifying information. Remember that personal data may only be processed in the manner and for the purpose for which the data subject was informed in the privacy notice before the start of the investigation. If the need to deviate from the one specified in the data privacy notice arises during the processing of the data, inform the subjects immediately and update the data privacy notice and other necessary documents.

10. At the end of the study, take care of the identifiable parts of the data according to the information you provided to the subjects. What parts of the data do you destroy? What will you possibly anonymise? What do you store post-project in identifiable form, e.g., pseudonymised, for verification of results or any follow-up that may be included in your original study? What are you archiving, where, and with what usage restrictions?

Make sure that the processing of personal data is described in a consistent manner in your material management plan and in the research notification and privacy notice you provide to subjects.

If you have questions about the processing of your data at the end of your research, please contact the Open Science Centre:

Need help with data privacy issues? Contact the university's Data Protection Officer:

Ethical Review in the Human Sciences

You may need a prior evaluation of your research from the University's Human Sciences Ethics Committee or the Ethics Committee of the Hospital District of Central Finland if your research set-up meets the specific criteria defined by the Finnish National Board on Research Ethics:

  • Participation in the study deviates from the principle of informed consent,
  • the investigation intervenes in the physical integrity of the subjects,
  • the study is conducted with participants under the age of 15 without the separate consent or information of the guardian, on the basis of which the guardian would have the opportunity to prohibit the child from participating in the study,
  • the subjects are presented with exceptionally strong stimuli,
  • there is a risk in the research to cause mental harm to the subjects or their relatives beyond the limits of normal everyday life, or
  • the conduct of the investigation may pose a safety threat to the subjects or the investigator or their relatives.

Please note that you don't have to describe the detailed ethics prodecures such as permission and consent procedures in your DMP! Those are described in the ethical assessment submitted to the Ethics Committee with the Appendix 3 'Principal investigator (PI)'s ethical assessment', when you make a review request. In the DMP, you can concentrate in desrcibing the practical data security procedures such as access control with which you assure that the personal information is handled and stored in a secure way during your research.

If needed, contact the University’s Human Sciences Ethics Committee well in advance of starting your research. Review can no longer be requested once you have started the research.

2. Collect, create and process

A large part of research data are in digital form to start with. Research data typically consist of

  • questionnaires and interviews
  • different types of measurement and observation 
  • other types of video, image, audio, and text materials.

Basic datasets for research are collected e.g. in the form of questionnaires, interviews, video recordings, as well as with various devices and sensors. Different measurement and data collection methods yield different metadata and file formats. This poses challenges especially when investigating the same phenomenon with different observation methods and datasets. The processing of data (e.g. regarding data descriptions, data protection and data security) can be streamlined and automated by software designed for the purpose.

At the data analysis stage, the actual results of an empirical study are derived from the raw data. When raw data is processed, aggregated and analysed, it results in various further datasets for elaboration and reporting. To make the study progress smoothly, it is important that you handle the generated datasets in a controlled fashion. Establish uniform data handling procedures among your group that everyone follows, and document them. There are different data management software applications available for different types of datasets. 

Documentation during the project 

Essential questions: How do I document my data so that they can be found, accessed, and used by me and others tomorrow, in a week, and years from now? If a completely unknown researcher found my data, could they understand what they are about? What do I need to do to understand and be able to use my data?

Documentation refers to the creation of descriptive information that clarifies the context and methods of capturing and processing the data, as well as the structure of the data, and the chosen file system (e.g. subfolders, naming, etc.) It can mean, for example, a description of variables and key vocabulary as well as units of measurement, or an inventory of research interviews and related basic information. The documentation may also contain information on, for example, the version of a particular dataset. Technical metadata produced by technical equipment (e.g. calibrations, etc.) are also essential documentation.

Best practices
  • Find out what kind of software is available for you for data collection, processing and analysis. University's Digital services provide software upon request for data processing.
  • Using discipline-specific metadata standards for documentation is a good way to ensure the future findability and reusability of the data. Metadata standards help you describe your data with standard attributes which make it easier for other researchers in the same field to make sense of them. The description can be saved in e.g. a Readme file in .txt or XML format and stored alongside the data.
  • Store the descriptive information in separate files (eg Readme files, inventory excels) alongside the datasets in a subfolder you name /DOCUMENTATION. This way, the documentation files can be found also by someone who does not know the structure of the data in more detail.
  • Plan what kind of documentation you produce and where to find it if you don't use the /DOCUMENTATION subfolder. If possible, use metadata standards for your industry in the documentation.
  • Agree within your research team members well in advance on a uniform way to arrange files in folders and subfolders. A logical folder structure streamlines work and reduces the risk of loss.
  • Use open file formats instead of commercial formats. Open, standard file formats are the best guarantee of data availability after several years. Examples of recommended and acceptable formats can be found in the UK Data Service format comparison table.
  • Keep the documentation up to date as your data collection and processing proceeds. When planned well ahead and done at the same time as the you work with your data, it is a small effort, but at a later stage it becomes practically impossible.
  • Describe your documentation practices in your DMP.

Creating and maintaining metadata  

Metadata refers to general descriptive information about research data (e.g. owner, authors, distributor, name, short description, location…) Up-to-date metadata is the key to finding and accessing your data. Basic project-level metadata is created and maintained in the Research data section of Converis, the JYU research information system. For more information, see step-by-step guidelines in JYU Intranet Uno. When kept up-to-date during the course of the project, metadata clarify for both you and secondary users what the work is about. 

At JYU, metadata is maintained in the Research Data System section of Converis. A metadata entry should be created for each dataset. When describing your data, divide them into such entities that you can describe them unambiguously. The data described in more detail can be bundled in Converis into larger entities under a larger “parent dataset”. See the detailed instructions for managing your metadata in Converis.

Quality assurance

Ensuring the integrity and quality of the data as they are collected, migrated and transferred is an important part of data management. Careful documentation of the procedures of data collection is the primary measure to ensure the integrity and quality of the data. Depending on the type of material, equipment and methods, integrity and quality can be ensured, for example,

  • by calibrating measuring instruments to monitor the accuracy and scale of detection
  • by reviewing the spelled interview material with an external expert
  • using industry standardized methods, hardware, and software.

Pseudonymisation and anonymisation

In order to secure the safest possible handling of personal data and to follow what you have promised to your study subjects in the privacy notice, pseudonymisation and/or anonymisation of personal identifiers in the data may become topical. Anonymised data can be safely shared during the project and published at any point. However, anonymisation procedures have to be planned well in advance, and they require time and some effort.

What's the difference between the two?

Pseudonymous data are still personal data under the GDPR, but they can no longer be identified without combining them with other information. Replacing real names with fake names and identifiers with codes are typical pseudonymisation techniques. As long as the key to the coded identifiers is stored separately from the data in a secure storage location and only designated people have access to them, pseudonymisation may suffice as a data security method during the project. However, as the project ends, identifiers should be erased or, if storing them is necessary for e.g. enabling contact with the study subjects, the research group members should set a future date for re-evaluating the need for retaining the identifers. This should be documented in the data management plan.

Anonymisation means that all identified and/or identifiable information is irreversibly removed from the data, and no code to retrieve the identifiers remains. An individual can no longer be specified from the data, which means they are no longer personal data. In planning the life cycle of personal data in your research, familiarise yourself with what it takes to make your data anonymous, and whether anonymisation will impair their scientific value. Opting out of anonymisation and data publishing is a viable alternative for research that handles complex sets of personal data. For options, see guidelines for publishing the metadata.

When are my data pseudonymous and when anonymous?

  • Can any individual be recognised from audio, image, and video that you have taken? If yes, they contain personal information.
  • Do you have the pseudonymisation code key still stored? If yes, the data are still pseudonymous.
  • Even if the code key has been destroyed, are there still information in the data that combined could lead to identification of an individual person (e.g., municipality + school grade + gender + etc.)? If so, the data are likely to still be identifiable, which means that some of the variables should be aggregated or classified to blur the identifiability. '
  • Have you collected open-ended answers? If yes, could the respondent be identified from their text? If yes, the identifying bits have to be removed in order to make the data anonymous.
  • If you collect qualitative writings or notes, could the respondent be identified from their text? If yes, the identifying bits have to be removed in order to make the data anonymous.

Anonymisation guidelines and anonymisation plan model

There are various anonymisation methods for qualitative and quantitative data. For detailed instructions for both, see the Finnish Social Science Data Archive Guidelines. The FSD also offers an anonymisation plan model. In case you plan to deposit your data in the Finnish Social Science Data Archive, the FSD experts will help you in anonymisation at the data deposition phase. See the FSD guidance for researchers.

3. Store and share

As the research process goes on, there is a need to transfer datasets for storage. At the same time, it is often also necessary to share some data and related access rights with other researchers over a data network. Also at this stage it is essential to take proper care of data protection and data security issues for the research data. Sensitive data must not be transferred online without adequate safety measures. Bear in mind that keeping datasets on the hard discs of workstations, memory sticks and USB drives is in principle a risk in terms of data protection and data security. 

When datasets are transferred for longer-term storage, for possible further use, or to be published, the data should be carefully checked and anonymised. When due attention is paid to data management from the outset, the data can be prepared fairly easily for long-term storing, further use, and publishing. Comprehensive instructions for the handling of datasets are available in the data management guidelines of the Finnish Social Science Data Archive. 

Tools for data storage and sharing 

As the research process proceeds, there is a need to transfer datasets for storage. At the same time, it is often also necessary to share parts of data and related access rights with other researchers over a data network. It is therefore essential to take care of data protection and data security issues. Sensitive data must not be transferred online without adequate safety measures. Bear in mind that storing datasets on the hard discs of workstations, memory sticks, and USB drives is in principle a risk in terms of data protection and data security!

JYU storage services are being developed to cover all phases of data processing, from collection up to storing and sharing between project partners and collaborators.

For non-sensitive data, the University offers

For sensitive and highly confidential data, e.g. special category personal data, options include:

  • For small (max. 50 MB/file) sensitive datasets, CollabRoom cloud service (instructions currently only in Finnish) for storage and sharing
  • The national CSC Sensitive Data services (requires free registration to use CSC services; after that, access to services with JYU Haka authentication): SD Connect for storing and sharing, SD Desktop for processing and analysis. Note the current restrictions for SD services: SD Connect is not yet audited for secondary use of social and health data, auditing against Findata requirements underway [05/2022]. 
  • The University's S: and U: network drives (Digital Services recommends file encryption). Sharing of sensitive data is done using security email.

Nextcloud, CollabRoom, the CSC Sensitive Data services, and are suitable for sharing data also with partners outside the University.


University systems automatically take regular backups. However, you should plan and implement backups at the stage when you make significant edits to the data. Preserve the original files, i.e., the so-called master files separately from the analysis files and make all edits to the analysis files. This way, the data will not be lost if an error occurs in the data processing. NB! CSC's SD Connect does not offer automated backup for data.

Access control

  • Name and document the person in charge who oversees access control to the files.
  • Maintain information about who has accessed the data and who has access to any part of the data.
  • Define who has the rights to view, edit, and delete the data.
  • If you process personal information or other confidential or sensitive information, please specify who has access to it.
  • On what basis has each access (edit, view, delete) been granted?
  • How are controls implemented in practice (eg password-protected access, change log monitoring, encryption, physical space monitoring, locked lockers)?
  • If you process specific categories of personal data, make sure you follow the description you provide to the subjects.

Describe your planned measures in your data management plan.

Storing and sharing personal data

If your data contains personal or otherwise sensitive information, store it in the original storage device whenever possible. For maximum data protection, personal data should not be transferred outside the original storage location such if it can be avoided, e.g., to a separate analysis excel workbook. When kept and processed in the original location, it is easier to keep automated log to monitor who has had access to the data. 

When transferring personal data, make sure that you know exactly who receives it at the other end. Ensure your legal right to transfer personal data by informing your study subjects about who handles their personal information, why, and how, at the beginning of your project using the data privacy notice. If you cannot use the University's Nextcloud or CollabRoom tools for sharing and have to use email, security email or encryption of attached files are necessary. 

Sometimes data must be transferred outside the EU and the European Economic Area. This can be the case if e.g. appropriate analysis equipment only exists in some particular location. Special legal obligations must be taken into account for personal information transferred outside the EU-EEA area. If this is topical to your study, consult the University's Data Protection Officer

4. Publish, dispose of, and store data for longer term at the end of the project  

Publishing metadata and data at the University of Jyväskylä

The University of Jyväskylä strives for the widest possible openness of research data and their metadata, pursuing the principle "As open as possible, as closed as necessary". Accordingly, the researchers as the best experts of their data, evaluate which parts of the datasets can be published. By default, data should be published, and if they cannot be made openly available, the reasons are justified in the data management plan. The university recognises that depending on the nature of the data, there can be different degrees of openness:

  • The descriptive metadata is published by creating a metadata entry in Converis and requesting its publication via Converis - this is done with a couple of clicks; see step-by-step instructions in Intranet Uno. Once published, the metadata publication appears in the University's JYX repository with a DOI. 
  • Creating a metadata record does not mean that you need to publish it yet. You will update it as your project proceeds and keep it private until you are ready to publish it at the end of the project. This enables early support from the Open Science Centre in your data management support needs. Once ready, you will save the metadata entry in Converis in "For validation" status in which it will be checked and published by the Open Science Centre in JYX. This way, you get a published FAIR metadata publication with a DOI for persistent visibility and findability. 
  • The data files themselves can be published in part or as a whole and with varying terms for reuse in an established discipline-specific data repository, in the JYX archive together with the Converis metadata, or e.g. as a data article (for data journals, see tips provided by Aalto University). 

When planning your research, consider what parts of the data you can publish, what steps it takes to publish them, and what you will do with the rest of the data (archiving, disposal). This way, your plan covers the entire data lifecycle, and you will save a lot of time and effort at the end of your project.

Regardless of which data repository you choose, it is crucial that relevant information about the dataset and its storage site remain also at JYU. Converis metadata should be published for all datasets, including those that remain closed and/or are disposed of; see step-by step instructions. If you publish data in the University's digital repository JYX, the Open Science Centre will take care of recording the availability information for you.  

Where should I publish my data?

Best practices

• Publish your data primarily in a discipline or research-specific digital repository. In a sector archive, the data will most likely end up being found by researchers in your field.
The Re3data portal is an excellent site to search for a suitable repository and to browse repositories in your field.
• If a suitable field-specific repository is not available, publish your data in the university’s JYX repository. Note: For individual data files, maximum size at the moment in JYX is 3 to 4 GB. The amount of individual files to be deposited, however, is not restricted, so you can deposit as many files as needed.
• Use generalist data repositories such as Zenodo or figshare only as a last resort. The data deposited in them is heterogeneous, which lessens the discoverability of the data and makes it difficult to evaluate the findings.
• Remember that publishing data on you own or your project's website does not meet the requirements of the funders or the university's expectations regarding the discoverability and accessibility of the data.

Criteria for a FAIR repository

  • Widely used by researchers in your field
  • Gives the metadata (and, if applicable, the underlying data) a permanent identifier, such as a DOI or URN
  • Publishes machine-readable metadata and uses a known metadata standard
  • Has a certificate of operational reliability, such as the Core Trust Seal and the ISO 16363 standard
  • Allows you to choose the terms of use under which the material can be further used, and states them clearly as part of the metadata.

Finnish Social Science Data Archive (FSD)

The FSD focuses on acquiring social science data. Under certain circumstances, data from other relevant fields (Arts and Humanities, Education, and health sciences) can also be archived. Datasets deposited at the archive must meet certain technical and legal requirements. Before dissemination, archived datasets are processed and documented.

FSD promotes open access to research data as well as transparency, accumulation and efficient reuse of scientific research. FSD also responsibly implements the FAIR data principles, which aim at making data and services Findable, Accessible, Interoperable and Re-usable.

The archive is a national resource centre funded by the Ministry of Education and Culture and the University of Tampere. In addition to archiving and dissemination of data, key services include data-related information services and support for research data management. The archive operates as a separate unit of the University of Tampere.

The Language Bank of Finland

The Language Bank of Finland is a service for researchers using language resources. The Language Bank has a wide variety of text and speech corpora and tools for studying them. The corpora can be analyzed and processed with the Language Bank’s tools or downloaded.

Many corpora are publicly accessible, some require logging in. The rights to use restricted resources can be applied for electronically. Using the Language Bank is free for researchers and students.

If you are new to the Language Bank, take a look at the Language Bank introduction.


JYX is JYU's repository for publications and research data. It gives data sets permanent identifiers (DOI, URN). Metadata is sent to national METAX-catalogue that ensures that metadata and datasets can be found using national ETSIN service.

Publishing dataset in JYX is simple. Just create metadata for the dataset in Converis current research information system and make a request in Converis to publish (meta)data. See the instructions below:

How do I publish my dataset in JYX? 

Option 1. If you store the data in a Nextcloud group folder (JYU Groups, recommended option) or in your personal Nextcloud storage, follow these steps:

1. Make sure that all the required documentation and documentation files and subfolders are included in the folder in the correct order and that the files are clearly named. It's a good idea to place the documentation file at the root of the directory, where it's easy to open first.

2. First, click the link icon to the right of the folder." "

3. Click the + icon next to the Share link command to display the sharing settings.

" "

4. Change the sharing setting to Allow upload and editing (Read only by default) and click to deactivate the by default active expiration date in the menu.

" "

5. Click the Copy to clipboard icon and copy the link text to Notepad or directly in Converis.

" "

6. Check that your metadata entry in Converis is in "To be completed" status. Place the Nextcloud link in the “Nextcloud research data link for publishing” field in the Converis form:

" "

7. Finally, change the status of the Converis entry to "For validation" from the Save and select status menu to the bottom right on the form. Next Open Science Centre checks the entry and publishes in in JYX. After that, Open Science Centre will send you the activated DOI link.

Option 2. If you do not store the data in Nextcloud, follow these steps:

1. Organise the data and documentation files that you want to publish in clearly named subfolders. Place the documentation file(s)/folder is in the directory root where it can be easily found.

2. Convert the folder to a compressed .zip folder: right-click the folder you want to publish, select Send to and Compressed (zipped) folder:

" "

3. Send the .zip folder to the OSC data specialists who will then store the data in Nexctloud and publish the data with the metadata: For larger packages, you can use the JYU Funet File Sender system to send the email:

4. In Converis, save the metadata entry in For validation status from the Save and select status menu to the bottom right corner of the form. Now Open Science Centre adds the dataset link in place and publishes the entry in JYX.

I urgently need a DOI for my dataset, what do I do?

If you urgently need a Permanent Identifier (PID) for your data for reference, this is possible by publishing the metadata and/or the dataset in JYU's JYX Publication Archive:

  • Is the dataset ready for publication, and you wish to publish it in JYX? Enter the metadata information of the dataset in the Converis research information system as instructed. On the last interleaf in the Converis metadata form, select that you want to publish data with metadata. In the ""Nextcloud sharing link for publication" field, paste the sharing link into the processed and organized dataset you have previously saved in the university's Nextcloud storage service. If you need advice on how to prepare the data for publication or help in transferring the data in Nextcloud, or if you want to submit the data in another method, please contact . In the "More information" field of the form, enter the date by which you want to publish the data. Then save the information in the "For validation" status. After this, the data expert of the Open Data Centre checks the Converis metadata and transfers it for publication in the JYX publication archive. In JYX, the dataset receives a permanent identifier, DOI.

  • Is the dataset still incomplete, but you need a permanent identifier for reference in advance? Proceed as above, but on the last interleaf in the Converis form, select the option to publish only the metadata of the dataset. Save the metadata entry in "To be completed" status. Before saving, make sure that all the descriptive information required in Converis is up to date. In this case, a Nextcloud link or a link from another data archive to the dataset can be added to the information at a later stage, when ready for publication. When the dataset is ready, make a separate request to complete the descriptive information to or by using the Change Requests to OSC field on the last interleaf of the Converis form.    


Post-project storage and disposal of data

In longitudinal and follow-up studies, it is often necessary to store data in identifiable form according to the research plan and the personal data lifespan plan stated in the data privacy notice. When you plan to store personal data after the end of the initial study phase, justify it in the data privacy informing and in your DMP (typically, the need for contacting the study participants again after some time; storage is necessary for follow-up measures that build directly upon the initial research and are compatible with the original research plan). Once justified, you can set a timeframe and name responsible persons for re-assessing the need for further storing the data in identifiable form (e.g, 5 to 10 years). Nextcloud group folders are the recommended storage solution for non-sensitive data. For updated advice in selecting a storage solution for sensitive data, please contact Open Science Centre:

Check from your faculty whether your home subject has a research data archiving policy, of e.g. 1-5 years from the project's conclusion.

Disposal of any sensitive data must be carefully planned. Deleting files using operating system tools, or even reformatting a hard drive, will not irretrievably destroy the data. It is important to permanently destroy any data that includes personal, confidential or sensitive data after storage is no longer necessary. Choose the suitable method of disposal according to the data format:

  • Paper data is disposed of in the grey locked data security boxes at the university campus.
  • Electronic files can be overwritten with a overwriting software. Eraser and WipeFile are examples of free, open source erasing programmes. The University's Data Security Officer helps you in furthwr questions about secure data disposal measures at the HelpJYU portal.

Long-term preservation

By long-term preservation is meant preservation of specially valuable datasets of over 25 years. The national CSC service Fairdata-PAS preserves these datasets. According to its data policy, JYU coordinates the preservation of nationally remarkable datasets in Fairdata-PAS. For more information, see below. 

The Open Science Centre helps you with issues related to data management and publishing. Please contact us at!


Source selectively used to update the instructions: Fuchs, S. Research Data Management Basics, Meilahti, 31.3.2020. University of Helsinki Data Support. Creative Commons CC BY 4.0.