Personal data

In the data management plan, you describe at this point

1) Explain as comprehensively as possible what personal data is included in your research data. Please explain why you need to collect or process such personal data. How do you inform and protect research subjects?

2) Are there ethical questions related to your research data? Does the data contain sensitive or secret material (other than personal data, e.g. business secrets)?

  • Do you study people? If you collect data from or about a person, start with the assumption that you are processing personal data.
  • You must handle data responsibly and comply with the EU General Data Protection Regulation (GDPR) and Finland's national data protection law.
  • What forms do I have to fill out? How do I inform and protect the research participants? In this section, you will find instructions and answers to these questions.

Responsible processing of personal data is one of the basic starting points of ethically conducted research.

What is personal data?

If you collect information from or about a person, start from the assumption that you are collecting personal data.

The definition of personal data is broad. Personal data may include any information and characteristics relating to a person that could allow an individual to be identifiable. No precise information is required for identification, and there is usually some level of personal data in the dataset when people participate in research or you collect information from or about people. In addition, identification can take place by combining the data with, for example, additional information found on the Internet.

Some examples of personal data:

  • age
  • image (recording etc.)
  • speaking voice (for example, in interviews)
  • educational background
  • industry, workplace or profession
  • residential area
  • statements and opinions characteristic of the person
  • email address
  • ethnic background or nationality (special personal data)
  • health data (special personal data)
  • income
  • exercise habits
  • fingerprints
  • a characteristic physical traits
  • information on the research subject's friends or other persons related to the research subject

Information on the research subject's family or other third parties is also personal data, such as information on the teacher's pupils.

Personal data is therefore much broader than just name or background variables.

For example, interview responses or survey responses themselves often contain personal data. Similarly, belonging to the target group may constitute personal data.

For example, if you ask secondary school PE teachers about their own sports hobbies, the indirect personal data would already include their profession and sports habits/hobbies/physical activity.

There are different levels of personal data: some are data that alone are sufficient to identify a person, but any information related to an identifiable person is considered personal data.

Interviews always contain personal data, as the speaking voice is a so-called direct identifier, i.e. information that alone is sufficient for identification.

Personal data does not have to be particularly secret or intimate. What matters is whether the person can be identified. Nor does recognizability require that anyone can identify a person. It is enough if only family members or colleagues could identify the person. (Source: Data protection guidelines for researchers)

Justify the collection of personal data based on your research plan and/or research questions.

In your research plan, you describe the research design and research objectives. It must be possible to justify the processing of personal data on the basis of these. The amount of personal data should be minimized.

  • Do not collect personal data that you do not need to answer your research questions. You may only collect personal data that you need for your research. For example, if the age of the research subject does not matter in your study, do not ask for it. Data minimisation is one part of protecting research subjects.
  • It is also generally better to collect information such as age or years of work as categories; for example, work experience 0-4 years, 5-10 years, 11-15 years etc.
  • Sometimes the interviewee tells more about themselves than the interviewer has asked. In this case, excess data will be removed from the data.
  • Try to avoid collecting special categories of personal data (such as health data) or sensitive data (such as personal experiences of domestic violence or bullying) combined with direct identifying information (such as voice or name).

When processing personal data, the data may not be stored just anywhere. For example, you can't use Google's cloud services. You will also need secure software and tools if, for example, you conduct interviews or surveys. Information security is described in section 5.

Data protection is not intended to prevent research from being conducted, but the purpose is to protect research participants. It is important that the processing of personal data has a justification and legal basis and that the research subject is aware of what is being done with their data.

The guidelines for processing personal data do not apply to deceased or fictitious persons.

Does your data contain special categories of personal data?

Special categories of personal data include:

  • ethnicity
  • political opinions
  • religious or philosophical beliefs
  • trade union membership
  • health information
  • sexual orientation or behavior
  • genetic and biometric data for the purpose of identifying a person

If your data contains special categories of personal data, it is very important that the data is handled responsibly. The privacy notice has its own section where you inform about the processing of special categories of personal data. Follow these protective measures (more on these later):

  • Stricter requirements related to data security.
  • Data minimization: you may only collect special categories of personal data that are necessary for carrying out the research. The amount of special personal data must be proportionate to the objective of the research.
  • Pseudonymisation should always be carried out if it is possible. More on pseudonymisation later on this page under Pseudonymisation and anonymisation.
  • Documentation of the processing of personal data: keep track of what you do, where the data is stored and what you have agreed with the research subjects.

For example, if you studied fatigue experienced by university students, you could get information about a respondent's depression, anemia, or other health data. In other words, the data would contain special categories of personal data.

The collection of data must be planned so that if special categories of personal data are not intended to be collected, they may not enter the data accidentally. If special categories of personal data might appear, plan the collection of data with the assumption that you are processing special categories of personal data.

The University of Helsinki has its own test that you can use to test whether your data contains personal data or special personal data. You can use this as a guideline. Please note that the links in the test take you to the University of Helsinki's instructions, which are not suitable for you.

Checklist for handling personal data

More detailed information on all these matters can be found on this page, but here is a checklist of things that data processors should take into account.

  • Identify what personal data you collect or process.
  • Who is the data controller?
  • Assess the risks.
  • Provide the research participant with a privacy notice, a research notification and a consent form (this is done to inform the research participants).
    • If you receive a dataset from a project for example, you do not submit the privacy notice yourself. Instead, a commitment is made with you regarding the processing of personal data (templates are only available to staff).
  • Verify and document the consent of the research subject.
  • Do you need a research permit?
  • Only collect personal data that is relevant to your research (minimisation).
  • Document the processing of personal data.
  • If possible for your research: do pseudonymisation or anonymisation and avoid collecting direct identifiers initially.
  • Ensure data security (e.g. storing data on the university's U-drive and using secure devices).
    • Safe interview programs or equipment:
      • A tape recorder borrowed from your faculty
      • Zoom used with university ID at
      • The university's Teams is also ok if you do not process special personal data.
    • Safe survey programs

Light, free-form risk assessment is part of all data collection. Therefore, consider whether collecting the data could pose risks to the research subjects, you or outsiders. In more serious situations, an impact assessment may be required, which you can read more about on the page.

Ethical review: if a research meets certain criteria, an ethical review must be requested. Students should avoid conducting research in their thesis for which they should apply for ethical review or conduct an impact assessment.

Organisations may require a research permit if their students or staff participate in research, for example if you want to interview teachers at a certain school. Check the permission policies with the target organization. Example of JY's permit requirements Research permit | University of Jyväskylä (

Research notification, privacy notice and consent form

According to the law, a privacy notice must always be made if personal data is processed in research. In addition, research participants will be given a research notification and a consent form. This is how you inform research participants, i.e. tell them about the processing of their personal data. As a rule, research participants have the right to know about the processing of their personal data.

You can find the university's instructions for students on the university's website. The aim has been to summarise the main parts of the instructions on the university's website in this course material.

The university's website contains templates for the privacy notice, research notification and consent forms. Read the instructions below on how to fill out the privacy notice!

You can use the templates and also look at the example privacy notice. If necessary, ask your supervisor for help.

Instructions for making a privacy notice:

  • The privacy notice is a form in which you tell the research participant, for example, what personal data is collected, why, what is done with it and how it is secured. This is how you inform the research participants (ie. the data subject).
  • The university's data protection guidelines contain templates for forms (privacy notice, notification, consent form).
    • You can use the privacy notice template and also look at the pre-filled example. If necessary, ask your supervisor for help.
  • The privacy notice will be provided to research subjects.
    • Research participants are informed about the processing of personal data by means of a privacy notice and a research notification and consent form. The consent form is issued even if no signature is requested, as this provides additional information specifically related to consent.
  • The author of the thesis is the data controller, i.e. responsible for the processing of personal data.
  • The privacy notice explains the lawful basis for processing personal data.
    • The lawful basis for processing personal data may be the consent of the research subject. Use the template for privacy notice, research notification and consent form found under "Participants in coursework or theses".
      • If the data contains special categories of personal data, the lawful basis for processing them is explicit consent, which is a separate section in the privacy notice.
      • The template also has a legitimate interest as an alternative, which is only suitable for certain situations and requires a so-called balance test.
    • If the thesis plan meets the criteria for scientific research according to your supervisor's assessment, public interest can be used as the lawful basis for processing. In this case, use the template in the scientific research privacy notice, research notification and consent form.

If necessary, more information can be found in the university's data protection guidelines under Legal basis for processing personal data and requesting consent and Informing the data subject in the processing of personal data

What is the significance of the lawful basis for processing personal data?

  • Research subjects' rights are based on the basis for processing stated in the privacy notice. E.g. If the basis for processing is consent and the research subject withdraws the participation, all collected data will be deleted, even if it is difficult. If the basis for processing is public interest, previously collected data will not be deleted, but the collection of data will be suspended.

If you are writing your thesis in a research group or from ready-made data, the controller is often a university or other research organisation.

In this case, you usually do not make the privacy notice yourself. In the research project, the privacy notice may have already been taken care of and you have been mentioned in the privacy notice as a processor of personal data. In this case, you will have access to the data or part of it confidentially. A commitment to data processing is made with the project.

If the research subject is under 15 years of age, the consent of the guardian is usually required for the research. The privacy notice, notification and consent form are given to both the guardian and the child. The child should be informed in an age ppropriate manner so that the child understands.

Providing a privacy notice to research participants:

  • The privacy notice can be attached to an e-mail, given to research participants on paper or linked to the beginning of a Webropol survey.
  • If you want to link the privacy notice in the survey: It can be 1) shared via SharePoint or 2) posted on a personal JYU website and then linked from there.
    • In Webropol, you need to turn on the text editor to link (two T's next to each other in the upper right corner).

If providing a privacy notice to research subjects would cause unreasonable effort, it should be published. Everyone has a personal jyu website that can be used to help with this. For example, if you investigate comments on a public social media channel, link the privacy notice to the comments.

What if I don't plan to publish the personal information I receive? Do I need to make a privacy notice?

It doesn't matter if you publish the information you receive. What matters is that you process this data, so yes, make a privacy notice.

Consent to participate in research

Research participants are always asked for their consent to participate in research.

The consent must be documented, i.e. it must be verifiable afterwards. For example, document consent like this:

  • By requesting a signature on the consent form or
  • By asking for consent at the beginning of the recording of the interview
  • By having the research participant tick the box "I have read the privacy notice" at the beginning of the survey.

Consent is requested regardless of whether consent is the lawful basis for processing in the privacy notice. Research participants must have enough information about the research before they can agree to participate in it. For example, research participants must know what data is collected about them and why.

When consent is the legal basis for processing (i.e. it is not scientific research), special attention should be paid to the content of consent and how to request it, because giving consent must be an active action. In the previous section, Privacy notice, Research notification and consent form, there were links to consent forms.

How do I verify consent in a survey?

At the beginning of the survey, attach the privacy notice, research notification and consent form. Put a mandatory box to tick to acknowledge the documents as read and understood.

It is a good idea to provide the research participant with a consent form without a signature box (or equivalent information), even if the consent form is not signed. This ensures that the research subject has the necessary information.

If the research subject is under 15 years of age, parental consent is usually required for the research.

Pseudonymisation and anonymisation

Personal data shall be pseudonymised whenever possible. This is a key part of protecting research subjects. Anonymisation may not be possible.


  • The names, place of residence and other personal data of research subjects are replaced with codes.
  • The code key is stored separately from the data in a secure location, such as a locked desktop drawer. It is still possible to establish the identity of the research subject.
  • In practice, a code key is a list containing, for example, the name of the research subject and its corresponding alias or number sequence.
  • Pseudonymisation does not offer the same protection as anonymisation, as individuals are still indirectly identifiable.

There are different ways to pseudonymize.


  • The data is edited so that all personal data is deleted. This also applies to indirect personal data, such as domicile or occupation. It is no longer possible to identify the research subject in any way.
  • Note that genuine anonymization is challenging. The anonymisation of data may even be impossible because anonymisation would remove so much content that the rest of the data would no longer be relevant. In addition, as technology advances, new ways of combining data may emerge.
  • Anonymisation is one of the possibilities mentioned in the decree to open data to downstream users.

So, if you're considering anonymization, think about whether it's realistic.

Do not promise research participants that the data will be anonymous if this is not really possible.

If the data is genuinely anonymised, it is no longer personal data. Pseudonymised data, on the other hand, are still personal data.

I'm doing a survey asking for age and county. However, the data cannot be linked to voice mail. Do I need to file a privacy notice? Is it anonymous data or personal data?

The survey can be anonymous if, for example, all questions are on a scale of 1 to 5 and few categorical background variables are asked. For example, ask for age in categories, not exact age, eg. between 20-30 years. If the survey has open-ended questions, there is a greater chance that it also contains personal data. If the survey is anonymous, i.e. no one can be identified directly or indirectly by combining the data, a privacy notice does not need to be submitted. However, you will need a research notification and consent to participate.

If there is even the slightest chance of identifying the respondent, especially if you collect indirect identification data and open-ended responses, also prepare a privacy notice.

Always ensure that you use secure survey software such as Webropol (basic personal data) or REDCap survey software (special categories of personal data). E.g. The use of Google Forms is prohibited. N.B! Always remember to remove survey responses from the software after processing has ended.

More information on identification: when is a person identifiable? Remember that identification can take place by combining information from different sources.

  • If the research subject's place of residence and occupation are mentioned, and it's a small town or a rare occupation, the information may be sufficient to identify the person.
  • For example, the president is easily recognizable because the presidency is a position held by only one person at a time. However, the president could also be interviewed, as long as he is told that he is identifiable.

Personal data is classified into direct identifiers, strong indirect identifiers and indirect identifiers.

Direct identifiers, such as name, are obviously personal data, but personal data also includes other information, characteristics, factors, actions and behaviour relating to a person.

For more information, see FSD's instructions on the identification and anonymisation of personal data.

Social media data

Social media data often contain both data protection, contract law and copyright issues.

Tip: The posts on the Suomi24 forum are available through the Language Bank. The Language Bank is an archive that stores various language dat.

Social media data such as Instagram or discussion forums come with different challenges. However, they are often also interesting to study. Remember that research participants always have the right to know that they are the subject of research. Therefore, you must try to tell about the processing of personal data even when the data originates from social media. Even if something is published on the internet, it can still present ethical challenges when it comes to research data.

Is it possible to inform research subjects? If you are unable to contact the research subjects personally

  • Make the privacy notice publicly available. For example, if you investigate comments on a public social media channel, link the privacy notice to the comments.
  • In other words, a privacy notice will be published if delivering it to research participants would cause unreasonable effort. Everyone has a personal jyu website and JYU SharePoint that can be used to do this.

Ethical questions: Consider what kind of risks or disadvantages the processing of personal data may cause to research subjects?

  • Are there ethical problems related to the use of the data? Do not research closed groups due to ethical challenges.
  • The use of social media as research data is in itself an ethical challenge, as content produced for social media is not intended for research.
  • Can research participants be harmed if, for example, their social media posts are studied and highlighted in the thesis and thus brought to the attention of a new audience? Consider carefully whether you can use direct quotes, as the post will then be easy to find by googling.
  • Consider how sensitive the contents are. Examples of particularly sensitive topics: criminal convictions, drug use, financial problems, mental health problems, controversial political opinions and activism. What kind of harm or damage could the subjects of the investigation then suffer as a result of being the subject of the investigation? It is recommended to avoid such social media topics, as the research could then involve ethical challenges of such magnitude that they cannot be solved within the framework of the thesis.
  • Do not investigate closed groups that need to be requested, as this would be ethically very challenging. If, on the other hand, it is, for example, a Facebook group that is open so that without logging in, anyone can read the content, one might think that it would be possible to study the group. Still, consider the ethical challenges and, for example, questions related to informing research subjects.
  • Ethical issues related to social media data are written about, for example, in the EU guideline Ethics in Social Sciences and Humanities

I am researching videos on a YouTube channel. Videos are protected by copyright. What does this mean for my data?

You can reference videos, such as reference text or an image. Screenshots: In general, you can take screenshots because they can be considered an image quote. If you use an image in your thesis, the image must be essentially related to the text, and it must be part of the analysis.

YouTube's current terms of use prohibit, for example, downloading videos to one's own computer. What does this mean for my data?

You cannot save the data for yourself. Your data only exists on YouTube, and if the creator of the videos were to delete the videos for any reason, you would lose your data. If a single video were removed, the data would no longer be intact. This in itself does not prevent research, but it is good to be aware of this risk. Reproducibility is part of the essence of scientific research: scientific research must be reproducible, and it cannot be done if the data no longer exists as such.

I study comments on Instagram. Comments are on an open Instagram account and can be read without logging in by anyone. Some people appear under their own names, others under pseudonyms. Do I process personal data in my data?

If the account is open so that anyone can read the comments without logging in to Instagram, you might think that the commenters themselves have published their own personal information. However, the commenters have the right to know that they are the subject of research. If it is impossible to inform the authors personally, information on the processing of personal data must be provided by other means. Make the privacy notice publicly available and link it to the comment thread you're researching.

You may not be able to save the data for yourself. Keep in mind the risks: comments or photo may be removed from the service.

Many of the examples relate to the question of what is public. Consider on a case-by-case basis and consider from an ethical perspective. Not all data found on the Internet is in the public domain.

  • For example, X (formerly X). Twitter's tweets are, by definition, public. You can take a direct quote from such a post. In this case, the author is mentioned in the same way as in a text quote.
  • For Facebook and Instagram, the definition is particularly challenging due to frequently changing terms of service.
  • Check the terms of service. They are subject to change, so always check for the latest version. What the Terms of Use say about data storage, content publicity, content sharing, copyright, and more.

Other sensitive or confidential material

In addition to personal data, sensitive and confidential data may include, for example:

  • business secrets,
  • information on endangered animal or plant species,
  • information relating to national security,
  • criminal convictions , etc.

Please note that the thesis should not deal with highly sensitive information, such as the aforementioned "national security information". The thesis is a public document. Sometimes a topic of interest to a student may be too ethically challenging for a thesis. It is not necessarily about the student's skills, but about what it is possible to do within the framework of the thesis.