Guidance for research data management at JYU

1. What is research data and why should I manage it?

Research data refers to

  • digital or analog basic data and materials for scientific research, 
  • further data refined from these, upon which the research findings and published research results are based, as well as  
  • code and software upon which the research results build.  

At the University, management, storage and reuse of research data are essential elements of research infrastructure. Domestic and international research funders as well as publishers not only appreciate but increasingly mandate openness and transparency of research data. Above all, good data management leverages the research process itself. With well planned data management, you

  • make your research more efficient
  • comply with funders' requirements
  • comply with data protection legislation and protect your study subjects
  • agree upon data ownership and rights as well as data sharing and preservation together with your partners
  • agree upon how, when and on what terms you open your data
  • ensure that others can cite your data, giving you due scientific credit
  • ensure that necessary resources and equipment are available to you throughout your project.

JYU provides research teams with an in-house research data infrastructure. It features tools for supporting data management with appropriate data security and storage capacity, as well as related advice and guidance. 

Complying with the FAIR principles

In accordance with the international FAIR framework for maximum reusability for research data, JYU encourages data management that leads to Findable, Accessible, Interoperable and Reusable research data and metadata. Major funders such as the Academy of Finland also require that metadata and, if possible, data issued with their funding are FAIR. In order to ensure that your data and/or their metadata are FAIR is to follow these five steps:

  • Archive your data in a data repository at the end of the project 
  • Choose a repository that provides your data a persistent identifier (PID), such as DOI or URN
  • Store your data in a open file format such as Rich Text Fotmat (.rtf) or .csv; these are more interoperable and less subject to loss and obsolescence than proprietary formats 
  • Create descriptive metadata for the data (see Documentation and metadata below)
  • Licence your data with a licence that states clearly the conditions and restrictions for reuse (see Ensuring further use of data below).

2. Initial stage of research

Contracts and agreements

Particularly in view of the further use of research data compiled in your research project, make a written agreement upon ownership of and access rights to the data within your group and with your partners as early as possible. There are agreement templates available for this purpose. Transfer-of-rights agreements are always recommended, but must be made at least in the following cases:

  • projects funded by Business Finland (formerly Tekes)
  • EU research programmes
  • commissioned research
  • as requested by the funder.


Legal and ethical issues, data privacy 

Ethical issues, and in particular everyone's legal right for privacy, may influence how you collect, store and handle research data, who can use the data, for which purpose, and archiving the data.

Consider at least these questions before you start to collect your data

  • Does your research involve handling any data that can be connected to a person, even indirectly?
    If yes, then privacy legislation applies to your research. Pseudonymization, or even anonymization of some of your data may be necessary, but carefully study all that such processing entails, well in advance.
  • Is any of the data especially sensitive? Such sensitive data are ethnic origin, political views, religious or philosophical beliefs, membership of trade unions, genetic data, personally identifying biometric data, health data, crimes and criminal convictions, sexual behaviour and orientation.
    If yes, then special, more stringent regulations regarding explicit consent obtained from persons, and regaring storage and handling of the data, apply to your research.
  • Who has the right to determine usage of the data, now and in the future?
  • Who gets to see the data?
    Note here also the service providers you use. If you e.g. conduct a survey using Webropol -service, they technically have opportunity to see the content of your data.
  • Where does the data come from?
    Do you get the data directly from persons with their explicit consent, or do you gather any information form other sources?
  • How big is the risk your data creates?
    If the risk is considered large, your research has to get special EU risk assessment, see Data Protection Impact Assessment (DPIA).
    Particularly large risk is considered to result from these situations:
    • Large number of persons whose data is handled
    • Large amount of data about a person
    • Sensitive data
    • Data about vulnerable persons (e.g. children)
    • Usage of data for automated decision-making
    • Systematic monitoring

Ensuring further use of data

Even if the dataset will be published as open access, it is always worth licensing. This way, you establish clear conditions for who can reuse your data output, and how. For a justified reason, licensing can be used to restrict user rights to the data, e.g. for non-commercial use only. As a rule, however, the license should be as permissive as possible to enable effective further use. 

In the humanities, make sure to inform your research subjects about the purpose and scope of using the data that is to be collected. Timely permission from your subjects is a prerequisite for any future reuse of your data.  

Procedures

Data management plan

Research funders, e.g. the Academy of Finland, require applicants and/or grantees to provide a data management plan (DMP) as part of the research plan, stating how the research data will be obtained, used, stored and protected, and how their later use will be enabled for others. 

As a rule, one should seek to publish the data for further use. However, publishing does not necessarily mean that the data could be used by anyone for any purpose whatsoever. It is therefore important that you acknowledge and document the ownership, control and terms of use for the data and include these in the descriptive metadata. The ownership and control of a dataset entail a right to decide on the purpose of its use, but also responsibility for proper management of the data. In many cases, the data themselves cannot be for a justified reason be made available, but even in these cases, their metadata can. 

Any datasets including personal data of a special category (and always when collecting identified or identifiable personal data in the humanities) should be saved and kept in data storage services appointed by the University, and where necessary also be transferred and delivered in accordance with relevant legislation and data security instructions. A data management plan should typically describe:

  • What kind of data will be collected and how?
  • In what ways the data will be processed and analysed?
  • What kind of resulting data will be produced?
  • How the research data will be stored and how their possible later use is arranged?
  • Ownership and access rights for the data (including related written agreements).

You can use the DMPTuuli online tool for drafting your data management plan. DMPTuuli is designed for preparing tailored DMPs in accordance with the specific requirements of different funders. It also includes JYU instructions for preparing a DMP.

When preparing a data management plan, note that:

  • The plan is worth making even if the research funder would not necessarily require it. It serves as a useful updated document along the progressing research work and helps, for example, in documenting the processing of personal data in accordance with the GDPR.
  • Keep in mind the regulations of the University's archive formation plan as regards the length of data storage times as well as other instructions for archiving, such as JYU’s directions for registering, processing and storing documents (TOS). 
  • Comply with the handling instructions of confidential information.
  • Where necessary, request a statement from the Human Sciences Ethics Committee about the ethical acceptability of the research, and read the instructions regarding information to be given to research subjects.
  • In human sciences, confirm with your research subjects the conditions for possible further use of the data.

3. Collecting and processing  data

A large part of research data are in digital form to start with. Research data typically consist of

  • questionnaires and interviews
  • different types of measurement and observation 
  • other types of video, image, audio, and text materials.

Basic datasets for research are collected e.g. in the form of questionnaires, interviews, video recordings, as well as with various devices and sensors. Different measurement and data collection methods yield different metadata and file formats. This poses challenges especially when investigating the same phenomenon with different observation methods and datasets. The processing of data (e.g. regarding data descriptions, data protection and data security) can be streamlined and automated by software designed for the purpose.

At the data analysis stage, the actual results of an empirical study are derived from the raw data. When raw data is processed, aggregated and analysed, it results in various further datasets for elaboration and reporting. To make the study progress smoothly, it is important that you handle the generated datasets in a controlled fashion. Establish uniform data handling procedures among your group that everyone follows, and document them. There are different data management software applications available for different types of datasets. 

Documentation and metadata

Essential questions: How do I document my data so that they can be found, accessed, and used by me and others tomorrow, in a week, and years from now? If a completely unknown researcher found my material, could they understand what it is about? What do I need to do to understand and be able to use my data?

Metadata refers to general descriptive information about research data (e.g. owner, authors, distributor, name, short description, location…) Up-to-date metadata is the key to finding and accessing your data. Basic project-level metadata is created and maintained in the Research data section of Converis, JYU's research information system. For more information, see detailed guidelines in JYU Intranet Uno. When kept up-to-date during the course of the project, metadata clarify for both you and secondary users what the work is about. 

At JYU, metadata is maintained in the Research Data System section of Converis. A metadata entry should be created for each dataset. When describing your data, divide them into such entities that you can describe them unambiguously. The data described in more detail can be bundled in Converis into larger entities under a larger “parent dataset”. See the detailed step-by-step instructions for managing your metadata in Converis.

Documentation refers to the creation any descriptive information that clarifies the context and methods of capturing and processing the data, as well as the chosen file system (e.g. subfolders, naming, etc.) It can mean, for example, a description of variables and key vocabulary as well as units of measurement, or an inventory of research interviews and related basic information. The documentation may also contain information on, for example, the version of a particular dataset. Technical metadata produced by technical equipment (e.g. calibrations, etc.) are also essential documentation. 

Store the descriptive information in separate files (e.g. Readme files, inventory excels, code bookkeeping) with the data in the subfolder you name /DOCUMENTATION. This way, your documentation files can later be found even by a person who does not know the structure of the material in more detail.

When drafting your DMP, you should plan what kind of documentation you produce and where to find it if you do not use the /DOCUMENTATION subfolder. If possible, use metadata standards commonly used within your discipline in the documentation.

It is especially important to keep the documentation up to date. When planned well ahead and done at the same time as the you work with your data, it is a small effort, but at a later stage it is practically impossible. 

See also the excellent Guide to Data Documentation by the University of Helsinki Data Support (in English), that reviews good practices for documentation, version control, and naming files.

Further procedures

  • Find out what kind of software is available for you for data collection, processing and analysis.
  • For research infrastructures and large research projects, a data management plan can be prepared in collaboration with JYU’s Digital Services.
  • Digital Services provide software for data processing. 
  • University Printing Services converts paper-based data into digital form as a paid service.

4. Storage and sharing of data during the project

As the research process goes on, there is a need to transfer datasets for storage. At the same time, it is often also necessary to share some data and related access rights with other researchers over a data network. Also at this stage it is essential to take proper care of data protection and data security issues for the research data. Sensitive data must not be transferred online without adequate safety measures. Bear in mind that keeping datasets on the hard discs of workstations, memory sticks and USB drives is in principle a risk in terms of data protection and data security. 

When datasets are transferred for longer-term storage, for possible further use, or to be published, the data should be carefully checked and anonymised. When due attention is paid to data management from the outset, the data can be prepared fairly easily for long-term storing, further use, and publishing. Comprehensive instructions for the handling of datasets are available in the data management guidelines of the Finnish Social Science Data Archive. 

Tools for data storage and sharing 

JYU offers various tools that can make data management easier, covering the phases from data collection up to storing, sharing and further use:

  • Workstations and related software (e.g. SPSS and Atlas.ti)
  • Special hardware and software for various research teams and laboratories
  • Centralised storage services and virtual servers
  • Nextcloud cloud storage and sharing service for non-sensitive data 
  • Software to support working in distributed research teams (e.g. Confluence-wiki, GoogleApps, Connect Pro online conferencing)
  • The CollabRoom cloud service for secure processing and transfer of sensitive datasets
  • TutkimusMoniviestin intended for storing of audiovisual materials
  • JYX publication archive for publishing small datasets
  • The University’s quota in CSC’s IDA storage service

Order the services needed in your research project from the package of basic services for research.

Storing and sharing personal data

If your data contains personal or otherwise sensitive information, store it in the original storage device whenever possible. For maximum data protection, personal data should not be transferred outside the original storage location such if it can be avoided, e.g., to a separate analysis excel workbook. When kept and processed in the original location, it is easier to keep automated log to monitor who has had access to the data. 

When transferring personal data, make sure that you know exactly who receives it at the other end. Ensure your legal right to transfer personal data by informing your study subjects about who handles their personal information, why, and how, at the beginning of your project using the data privacy notice. If you cannot use the University's Nextcloud or CollabRoom tools for sharing and have to use email, security email or encryption of attached files are necessary. 

Sometimes data must be transferred outside the EU and the European Economic Area. This can be the case if e.g. appropriate analysis equipment only exists in some particular location. Special legal obligations must be taken into account for personal information transferred outside the EU-EEA area. If this is topical to your study, consult the University's Data Protection Officer

5. Publishing, archiving and disposal of data at the end of the project 

Publishing data at the University of Jyväskylä

The University of Jyväskylä strives for the widest possible openness of research data and their metadata. The publication of data is guided by the principle of "As open as possible, as closed as necessary". Accordingly, the researchers, as the best experts in their data, should evaluate which parts of the datasets can be published already at the data management planning phase. Data experts at the Open Science Centre provide help and guidance for this. By default, data should be published, and archiving them with restricted access or disposing of them should be justified in the data management plan . When planning your research, consider what parts of the data you can publish, what steps it takes to publish them, and what you will do with the rest of the data (archiving, disposal). This way, your plan covers the entire data lifecycle, and you will save a lot of time and effort at the end of your project.

Regardless of which data repository you choose, it is crucial that relevant information about the dataset and its storage site remain also at JYU. Metadata should be published for all datasets, including those that remain closed and/or are disposed of. This is done in the Research Data section of the Research Information System Converis (see the step-by step instructions). If you publish data in the University's digital repository JYX, the Open Science Center will take care of recording the availability information for you.  

Benefits for publishing my data?

Opening your own data makes the world a better place, but it also directly benefits your research:

  • You will find your own datasets faster and more reliably in the future for re-use - no matter how many years, new computers, new working institutions, or even new continents, you've gone through in between.
  • Was it ever difficult to remember what this or that column in an old dataset of your own actually means? Or decide who deserves a co-authorship in new research using old data? Your own research is easier and meets higher quality standards, when its metadata is archived from the beginning and published.
  • Your research gets cited through datasets too, and you are easier to find. This means new contacts in your field of research, more name recognition, and more opportunities to do interesting and rewarding science.
  • Many funders and research institutions recognise published datasets as a significant output in evaluation of researcher merit. The weight of data as a contribution to science when awarding grants and positions is steadily increasing. Be in the vanguard and open everything you can - you will gain a competitive edge in funding applications and in other situations where your research merit is judged. For example when you apply for tenure in 2020s!
  • Most importantly: the chance of your work to impact the world grows, a lot.

Preparing for data publishing 

The prerequisite for opening data is the timeliness of the metadata and the documentation of the data. A good way to ensure this from the beginning of the research project is to keep the documentation of the methods, structure, content and other information relevant to the research in a subfolder called /DOCUMENTATION. General metadata is maintained in the Research Data section of the University's Converis research information system. 

Creating a metadata record does not mean that you need to publish it yet. You can update it as your project proceeds and keep it closed until you are ready to publish it at the end of the project.  enables early support from the University of Jyväskylä in your material management matters. Moreover, you can easily request for help from the University's data experts via Converis with a few clicks. Finally, via Converis, you can request the publication of ready metadata or even your dataset itself in the University's digital repository JYX.

Recommended data repositories

Best practices for choosing a repository

• Publish your data primarily in a discipline or research-specific digital repository. In a sector archive, the data will most likely end up being found by researchers in your field.
The Re3data portal is an excellent site to search for a suitable repository and to browse repositories in your field.
• If a suitable field-specific repository is not available, publish your data in the university’s JYX repository.
• Use generalist data repositories such as Zenodo or figshare only as a last resort. The data deposited in them is highly heterogeneous, which lessens the discoverability of the data and makes it difficult to evaluate the findings.
• Remember that publishing data on you own or your project's website does not meet the requirements of the funders or the university's expectations regarding the discoverability and accessibility of the material.

Finnish Social Science Data Archive (FSD)

The FSD focuses on acquiring social science data. Under certain circumstances, data from other relevant fields (Arts and Humanities, Education, and health sciences) can also be archived. Datasets deposited at the archive must meet certain technical and legal requirements. Before dissemination, archived datasets are processed and documented.

FSD promotes open access to research data as well as transparency, accumulation and efficient reuse of scientific research. FSD also responsibly implements the FAIR data principles, which aim at making data and services Findable, Accessible, Interoperable and Re-usable.

The archive is a national resource centre funded by the Ministry of Education and Culture and the University of Tampere. In addition to archiving and dissemination of data, key services include data-related information services and support for research data management. The archive operates as a separate unit of the University of Tampere.

tietoarkisto_merkki_colour_325x338.png

Kie­li­pankki - The Lan­gua­ge Bank of Fin­land

The Language Bank of Finland is a service for researchers using language resources. The Language Bank has a wide variety of text and speech corpora and tools for studying them. The corpora can be analyzed and processed with the Language Bank’s tools or downloaded.

Many corpora are publicly accessible, some require logging in. The rights to use restricted resources can be applied for electronically. Using the Language Bank is free for researchers and students.

If you are new to the Language Bank, take a look at the Language Bank introduction.

Kielipankki_Kielipankki_CS6-e1450869278629.jpg


JYX

JYX is JYU's repository for publications and research data. It gives data sets permanent identifiers (DOI, URN). Metadata is sent to national METAX-catalogue that ensures that metadata and datasets can be found using national ETSIN service.

Publishing dataset in JYX is simple. Just create metadata for the dataset in Converis current research information system and make there request to publish (meta)data.

JYX-huoltokatko ke 6.11.2019 klo 15:30 — Digipalvelut


Ze­no­do

Zenodo is a EU-funded repository for open science in any and all fields of research. Long-term storage and infrastructure is provided by CERN Data Center. Zenodo is not discipline specific, but open for all fields of science. In addition to datasets, researchers use Zenodo to share posters, presentations, conference publications, figures and articles. Due to this variance we recommend disicpline specific repositories or JYX.

Using the link above automatically ads your deposit to the University of Jyväskylä community collection in Zenodo.

Zenodo if free to use. You either sign up or use ORCID or GitHub accounts.

zenodo-gradient-2500.png


EOSC Por­tal

The European Open Science Cloud (EOSC) initiative has been proposed in 2016 by the European Commission as part of the European Cloud Initiative to build a competitive data and knowledge economy in Europe. By using the EOSC Portal you can find lots of information on data management and a list of data repositories.

Home


The Open Science Centre helps you with issues related to data management and publishing. Please contact us at researchsupport-osc@jyu.fi.

 

Source selectively used to update the instructions: Fuchs, S. Research Data Management Basics, Meilahti, 31.3.2020. University of Helsinki Data Support. Creative Commons CC BY 4.0.