How to write your DMP: JYU guidelines

These organisational guidelines to writing the sections of a DMP are specific to research conducted at the University of Jyväskylä, and follow the structure of the Academy of Finland's DMP template. NB! Do not just copy-paste. Think about the specific requirements that apply to your data types. Modify the example clauses that apply to your situation. Only write sentences that you understand yourself.

1. General description of the data

1.1 What kinds of data is your research based on? What data will be collected, produced or reused? What file formats will the data be in? Additionally, give a rough estimate of the size of the data produced/collected.

Checklist:

Describe data types in e.g. a table or a listing by category, and check that the description matches your research plan.
In the table/listing, differentiate eventual sensitive/confidential and non-sensitive/no-confidential parts of data (e.g., special categories of personal data):

A) Already-existing data that you reuse, plus the source (e.g. archives and registers),
B) Data that is collected and produced during the project,
C) Result data that issues from analysis of raw data.

Consider using open, standardized file formats, and if using commercial formats, justify the reason. Check the UK Data Service file format chooser for suitable formats (we recommend open formats that are interoperable in as many operating systems as possible and do not require users to download a paid software. If you use latter types of formats, briefly specify the reason to do so).
If you use technical equipment that produces the data, describe shortly what kind of data it provides (and if possible, that the data is compatible with some metadata standards). The most important thing is to show that you understand what kind of formats your data is contained and how it can be utilized, stored and described.

Example: A data table

Data type	Source	Personal / confidential data	File format	Estimated size
Analysed DNA sample	Processed from DNA sample	No	.xlsx, .csv	2 Gb
Statistical analysis X	Pre-existing from the Finnish Social Science Data Archive	No, anonymous	SPSS (.por, .sav)
Questionnaire	Collected from study subjects	Yes, identifiable and sensitive health information	.csv	5 Mt
Interview recording on video	Collected	Yes, identified non-sensitive personal information (facial image, voice)	.avi, .mp4
Interview transcript	Processed	No	.csv, .txt, .xlsx	>10 Mt
Images	Collected	No	.tif, .jpeg, .gif, .raw
Administrative documents	Permissions collected from study subjects	Yes, direct identifiers	.docx

Examples:

“The archival material received from archive N is available either in digitized form or in microfilm and the use of it does not require permits. I will also use published X and Y registers that are available either online or as printed publications.”

“To make the microfilmed archival material usable in my project, I will take photographs of the documents in archives and libraries or scan the microfilmed pages to my own USB drive. Other relevant material, like law codes or contemporary literature, can be retrieved from library databases or ordered directly from libraries (.pdf).”

“The interactions will be coded with standardized CLASS and ECCOM observation instruments. Parts of the interactions will also be coded and analysed by using conventions used in discourse and conversation analysis. Recall interviews will be transcribed and content coded. Eye-tracking data will be coded according to the teacher fixations and saccades on areas of interests.”

“Data consists of generally accepted formats. Most of the data will be quantified with SPSS program. The data formats for SPSS-data are sav-format and for video and audio recordings .mp4 and .mp3 format. Transcripts and content codings will be in .docx format. Eye-tracking recording are in json and .mp4 format (Tobii Analysis Pro software is needed for analyzing the eye-tracking data). Quantitative data will be processed and analyzed by using SPSS and MPlus statistical programs; Atlas-ti to analyze and handle qualitative data. All other materials, such as questionnaire are saved as .pdf-files. Used software and formats are based on open standards to enable data reuse, interoperability and sharing. The data content and specific methods as well as analyses are described in more detail in the research plan.”

1.2 How will the consistency and quality of data be controlled?

Describe here how you plan to minimise the risk of data loss or corruption in e.g. transferring the data from device and format to another (e.g., device calibrations, checking of transcriptions with an outside expert, etc.)

Examples:

"The consistency and quality of the data will be controlled by following the established data collection protocols in the laboratory at the Department of X. "

“In [project X], the software and formats that will be used are based on open and established standards to enable data reuse, interoperability and sharing of the data."

“Project will name a quality controller who enforces that quality ensuring measures are followed.”

“Files will be stored using checksums that are used to ensure that data is not corrupted when copying, transmitting and saving it.”

“Transcriptions of interviews will be checked by the quality controller.”

“Format conversions are made maintaining original information.”

"Data integrity will be checked using checksum (SHA256) when copying and transmitting the collected files."

2. Ethical and legal compliance

2.1 What ethical issues are related to your data management, for example, in handling sensitive data, protecting the identity of participants, or gaining consent for data sharing?

a. Does the research include processing of personal data?

Identify situations when the collected and processed data include personal data.

If personal data are processed in the research, data protection legislation applies to it. You can identify a research subject with personal data directly or indirectly. Personal information includes not only strong personal identifying details such as one’s name, personal identity number, or photo; for example, social, economic, cultural, physical, physiological, psychological or genetic factors can together or individually form personal data. Personal information is evaluated from the point of view of whether even a small group (e.g., the research subject and/or his/her relatives) could use the collected information to identify who an individual person is.

Examples:

“Since the source material [specify which parts] is openly available in public archives and libraries, and the individuals studied have already deceased, there are no relevant ethical, ownership or copyright issues regarding the storage or management of the archival datasets.”

“The research does not include processing of personal data.”

“Since the source material [specify which parts] is openly available in public archives and libraries, and the individuals studied have already deceased, there are no relevant ethical, ownership or copyright issues regarding the storage or management of the archival datasets.”

“The research includes processing of personal data. Direct, strong indirect identifiers and/or indirect tags are collected from the subjects. Collected personal data include, for example, the subject’s name, contact information, place of residence, educational background as well as psychological and financial information.”

“The research includes processing of sensitive personal data. Collected personal data include, for example, the participant’s name, contact information, place of residence and health information.”

b. Does the project include personal data that belong to a specific personal data group, i.e., sensitive personal data?

Identify and describe situations when collected personal data are sensitive as well as the legal basis for processing sensitive personal data.

Sensitive personal data include the following:

racial or ethnic origin
political opinions
religious or philosophical beliefs
trade union membership
genetic data produced by analysing a biological sample
biometric data for the purpose of uniquely identifying a natural person
data concerning health
data concerning a natural person’s sexual behaviour or sexual orientation.

If sensitive personal data are processed in the research, the processing must be based on Article 9 of the General Data Protection Regulation (GDPR). Scientific purpose of the research OR consent from the data subject are emphasised as a lawful basis when processing sensitive personal data as part of research.

Example:

“Sensitive personal data are processed in the research (data concerning health). The legal basis for processing the data is scientific research in public interest (GDPR Art. 6.1. e).”

c. Criminal convictions and offences

Identify and describe situations when collected personal data include criminal convictions and offences or related precautionary measures. Also identify the legal basis for processing the data. Processing criminal convictions and offences in research is based on consent of the data subject.

Examples:

“The project includes processing of data concerning criminal convictions and offences with the consent of the data subjects.”

d. Data processing and related roles

Identify and describe possible roles in the research and take care of the possible agreements required by the GDPR.

Controller refers to the natural or legal person, who, alone or jointly with others, determines the purposes and means of the processing of personal data. Where two or more controllers jointly determine the purposes and means of processing, they shall be joint controllers. Processor refers to a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller. The processor cannot decide how personal data are used or what information is collected.

For example, when two universities together as joint controllers determine what personal data are collected and used for scientific purposes, both universities are controllers. When these joint controllers submit personal data to a third party, such as to a transcriber or database administrator, the transcriber and database admin are both processors.
The GDPR requires controllers and joint controller to conclude data processing agreements with all the processors. Joint controllers should conclude an agreement showing the responsibilities and roles of data controllers in the processing of personal data.

Example:

“University A is the data controller. The processors are internet-based survey service provider Webropol Oy and University B, which conducts the analysis of data, as well as Tietoarkisto, where the data will be archived at the end of the research. Data processing agreements have been made with the processors.”

e. Where are the personal data obtained from?

Describe where the processed personal data are being collected.
The data can be obtained from the data subject directly or, for example, from another register.

Example:

“The processed data are obtained from the data subject directly on the basis of consent.”

f. Safeguards to protect personal data

Describe what safeguards will be implemented to protect the personal data. You must follow appropriate and risk-based safeguards in the processing of personal data. The research data must be processed anonymously if the objectives of the research allow it. If anonymisation is not possible, use pseudonymization of data whenever possible. If pseudonymisation is not used, justify your reason for it. Safeguards regarding the processing of personal data may include the following:

measures to improve the skills of staff dealing with personal data (training, advice and guidelines)
appropriate data security (pre-defined where the data are stored)
measures to prevent access to personal data
drawing up a data protection impact assessment (DPIA)
research has a principal investigator and an appropriate research plan
limitation of the retention of personal data (data will be destroyed or anonymised upon completion of the study)
data subjects cannot be identified from the research results
agreements required by the GDPR have been reached.

Examples:

“The data will be pseudonymised in the analysis phase. The data will be archived in a pseudonymised manner [define where]. It is not possible to archive data in anonymous form because it is a follow-up survey. The personal data safeguards include….”

“All direct IDs will be pseudonymised with a code, and the key to the pseudonymisation code will be stored separately from the data.”

“All individual IDs and any personal information enabling the identification of individual interview subjects will be erased as soon as they are no longer needed for the purposes of the research. Detailed anonymization method is described in the Data Privacy Notice.”

g. When to make a DPIA?

Identify a possible need to carry out a DPIA for your research. A DPIA is used to evaluate the impacts of processing on the protection of personal data. According to the Office of the Data Protection Ombudsman, a DPIA has to be made every time the processing causes a high risk for the data subject’s rights and freedoms.

High risk factors for the data subject include the following:

a large number of data subjects
a large amount of personal data per data subject
processing of sensitive data (e.g. health data)
processing personal data of vulnerable natural persons (e.g. children)
automatic decision-making (e.g. credit decision)
systematic monitoring of the data subject.

Furthermore, a DPIA is required in the following cases especially:

new technologies are used for the processing of personal data
processing on a large scale of personal data relating to criminal convictions and offences or special categories such as data concerning health, ethnic origin, political opinions, religious belief or sexual orientation
a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person
large-scale systematic monitoring of a publicly accessible area.

Whenever there is a justified deviation from the study subject's right to be informed about the processing of their personal data, a DPIA must be made. If the DPIA indicates that a high risk is still being encountered, the evaluation will be submitted to the national Data Privacy authority prior to the processing of personal data. For more information, please contact tietosuoja@jyu.fi.

Example:

“The project poses a high risk to the data subject’s rights and freedoms because the research focuses on the personal data of vulnerable natural persons (children) and a large amount of personal data is collected per data subject. Before the start of processing the personal data, a data protection impact assessment must be made.”

2.2 How will you manage the rights of the data you use, produce and share?

In this section, explain any issues pertaining to the rights of the data that you use, collect, and produce. Check the University's agreement models for help in drafting any needed agreements about data authorship, ownership, and usage rights.

Checklist:

If you use already-existing primary data (registers, archives, private providers, social media data), clarify here that you are aware of your usage rights and license to the data. Explain, if applicable, how you will comply to the source's terms (copyright, use restrictions).
Explain the ownership of the data. Projects that are consortia with Finnish Academy and JYU, rights to all results (including) are transferred to the University. Even in such cases it might be worth to note that permit to use the data is agreed on and proves no problem.
In all cases emphasize that ownership, rights to distribute and use the data are always agreed upon in written form so that the FAIRness of data is not endangered due to vagueness of ownership issues.

Examples:

“The ownership and access rights of data shall be agreed on when creating the data
management plan, prior to the start of actual research.”

“The ownership and access rights of data was agreed on when the data management plan was created, prior to the start of actual research.”

3. Documentation and metadata

How will you document your data to make them findable, accessible, interoperable and re-usable for you and others? What kinds of metadata standards, README files or other documentation will you use to help others understand and use your data?

Checklist:

Basic descriptive metadata of all new projects conducted at JYU is created, maintained, and stored in the University's current research information system, Converis. This ensures that the metadata can be made available in the University's institutional repository JYX at the end of the project. In JYX, the metadata will receive a persistent identifier and a lasting landing page. From JYX, the metadata is indexed in the national METAX metadata indexing system. In this section, you can start by describing this.
In addition, describe your planned procedures for methodological 'What, Why, Where, When, and by Whom' of what you do to and with the data (field and lab notebooks, embedded SPSS metadata, working instructions, protocols, README files, field-specific standard metadata, etc.)

Examples:

"The research group uses hard-bound laboratory log books as the primary documentation of daily research activities. In these log book, each scientist records the experiments done, the materials used and identifies the names of datafiles and where they are stored. Additionally, README.txt files will be used to store structural (naming conventions and purpose of folders and files) and administrational (e.g., when the files and folders were created, file types and other technical information, and who can access it) metadata. Every main folder and sub-folder will include a README.txt file to serve as a guide on the data and which will be updated to any changes made within its respective folder. Any additions, deletions or changes to naming conventions or purposes are immediately updated to README.txt.”

“The project shall name a team member responsible for monitoring and enforcing these data documentation rules. These documentation and metadata and their publication will ensure FAIRness of the data."

"When anyone saves data on the project’s shared network storage space (provided and technically administered by University of Jyväskylä, including backups, access control and security), that person shall also update and maintain this documentation:

The storage root as well as every sub-folder shall include a README.txt file. Each README.txt file describes the current naming conventions and purpose of all folders and files in that level of storage organization. Any additions, deletions or changes to naming conventions or purposes are immediately updated to README.txt.
[about “master data files”: if you use version control software, or otherwise use a method to consistently distinguish work copies from master data files, you should mention and describe that here briefly.]

Each master data file [or folder depending on type of data] will be clearly named as such [you can enter your naming convention here], and each will always be accompanied, in the same folder, by two files:

1) ABOUT_filename.txt/ABOUT_foldername.txt which contains a link to an initialized metadata entry in the University's Converis research information system. The preliminary entry can remain private and unpublished, just for your own use – but if and when you want to publish your data, it is easy because the metadata entry is already there.]

2) METHODS_filename.txt which/METHODS_foldername.txt contains link to, or text of, description of methods used to obtain, process and document the data, in detail sufficient to allow replication and understanding and usage of the data for other purposes.

Both the ABOUT and the METHODS files, and the preliminary metadata entry in Converis, will be created immediately as the data gathering begins and amended and updated throughout the research project as data is building up. These will then be readily available to convert to machine-readable formats upon archival or publication of the metadata in the University's repository JYX, and the eventual publication of the data in [name chosen repository].”

4. Storage and backup during the research project

4.1. Where will your data be stored and how will it be backed up?

Always use the University's institutional storage solutions for the primary project-time storage of the data:

The JYU Nextcloud is the primary recommended storage for all non-sensitive data. Check the tutorials for implementation and use cases.
For small (< 50 MB/file) sensitive datasets, there is JYU CollabRoom.
For larger sensitive or secret data, S: and U: network drives (file encryption recommended) or the national CSC Sensitive Data services (SD Connect). A project folder on the University's S: drive can be ordered in HelpJYU (log in first). Please note that during the year 2022, the CSC Sensitive Data services only support open source and Linux-interoperable analysis software e.g., LibreOffice, Python, and R. Windows interoperability is due in late 2022. Secondary use of social and health registry data is limited, see the instructions, "Limitations for secondary use".
For non-sensitive video and audio data, Researchvideo.jyu.fi.
For code, JYU GitLab.

Examples:

"Data (except sensitive data) of the project will always be stored in Nextcloud cloud storage service provided and managed by the Digital Services of the University of Jyväskylä. Cloud service is also ideal solution for sharing research data between collaborators.”

[IF project handles sensitive data:] “Sensitive data is stored and operated in university’s highly secure CollabRoom environment that has been tailored especially for this task.”

“The University’s systems will take automatic backups of the data to prevent catastrophic loss of data. In addition, manual backups of master data files will be taken regularly [insert suitable time interval] and always before any major file-format or data conversions. Non-sensitive video data will be stored in the University's Researchvideo.jyu.fi system."

4.2 Who will be responsible for controlling access to your data, and how will secured access be controlled?

Examples:

“Right to access the data is controlled by responsible researcher [PI, or someone specially dedicated to this], and technical access control is provided by the Digital Services of University of Jyväskylä.”

“[Non-sensitive] data will be available to all researchers of the project via Nextcloud cloud storage.”

“[In case of sensitive data] PI acts as administrator of project’s CollabRoom space and will define who will need and get access to the sensitive data in that workspace.”

“Access to the data will be documented and PI will be at any point able to tell who has access to what data.”

5. Opening, publishing and archiving the data after the research project

5.1 What part of the data can be made openly available or published? Where and when will the data, or their metadata, be made available?

Checklist:

The metadata that you have updated in Converis during the project can be published directly at the end of the project. You can start this section by describing it.
If your project cannot not make the data publicly available, describe here the justification for this. According to the policy of the University of Jyväskylä, all data should be made publicly available unless there are specific, justifiable reasons to keep the data secret (e.g., data privacy or IP). If your data are to remain restricted due to e.g. data privacy, explain it here. If the data cannot be made available, making the descriptive metadata openly available at the end of the project fulfils the open access requirements of the University and the funders.
Briefly describe what part of your data you will store for e.g. verification, and for how long.
If you plan to deposit data in a data archive, describe it.
You can categorise your data sets according to the anticipated preservation period:

A) data to be destroyed upon the end of the project
B) data to be stored post-project for a verification period, which varies across disciplines (e.g., 5–15 years)
C) data to be archived for potential re-use in e.g. Finnish Social Science Data Archive or JYX (e.g., for 25 years).

Examples:

“Metadata entries of the data will be published immediately in the University's JYX publication repository when they are considered sufficiently complete, even if the data itself is not yet public. (For description of creation and curation of metadata entries please see Section 3 of this plan.) The metadata will then be searchable in e.g. the national Etsin and Research.fi metadata catalogues.”

“Dataset(s) themselves, complete with full description of methods, will be published in [JYX / a certified, field-specific repository / Zenodo].”

“Sensitive parts of the data will be anonymized and published along with other data.”

“Sensitive parts of the data cannot be anonymized and thus cannot be openly published.
Sensitive data will be stored in JYX, only the metadata will be public, and access to the data can be requested and granted on certain conditions [describe the procedure and terms on which access and right to use the data can be granted] .”

“Sensitive parts of the data cannot be stored and will be disposed of after the project is finished.”

5.2. Where will data with long-term value be archived, and for how long?

Long term preservation would mean that data are preserved for several decades or even centuries. At the moment, the University of Jyväskylä does not yet have a ready procedure for selecting valuable datasets for log-term preservation in the national Fairdata-PAS preservation service. More information about the development of the selection service will follow during 2022.

Example:

“University of Jyväskylä will store all data archived in JYX indefinitely (for minimum of 10 years), in the format originally deposited in, but does no special packaging or continuous curation to guarantee long-term integrity and usability. Data with long-term value will be proposed to national Fairdata-PAS (system for true long-term research data storage). National and university-level policies for determining eligibility for Tutkimus-PAS are currently being developed, but not available yet.”

6. Data management responsibilities and resources

6.1 Who (e.g., role and institution) will be responsible for data management?

Here you are expected to show that you’re aware that data management needs people that are responsible for it. It might also need specific resources (time, money, computational facilities, support from data steward etc.)

Examples:

“Responsibilities for specific issues have been described in earlier sections.” [Make sure this is so. If not, describe them here.]

“The project will name a team member responsible for monitoring and implementing data documentation and data management at both consortium sites. The PI and the SL will have the main responsibility of data management, monitoring it and granting access to the data. The costs associated with storing and sharing research data are regarded as overheads for the project’s host organizations (JYU and X). The preparation of the data will be supported by the Open Science Centre of JYU."

6.2 What resources will be required for your data management procedures to ensure that the data can be opened and preserved according to FAIR principles (Findable, Accessible, Interoperable, Re-usable)?

Examples:

“Final preparation of the metadata and publishing it will require [best estimate of time] of work. Publishing the metadata will be supported by Open Science Centre of University of Jyväskylä and publishing the research data by [name repository]. The chosen repository will describe the data in a standard metadata format and store them in a file format suitable for reuse."

“Anonymization of sensitive data will require [best estimate of time and effort].”

This work is licensed under a Creative Commons Attribution 4.0 International License.