Collect, create, process

Collecting interviews

  • Do you collect public information? You can use any tool. Make sure you copy the data to the university storage services as soon as possible after collection. 
  • Do you collect the so-called ordinary or non-sensitive personal data? Use MS Teams or a device protected by the University encryptions. See the instructions for handling confidential information (Intranet Uno).
  • Do you collect special categories of personal data or otherwise sensitive information? Use JYU Zoom or devices protected with the University safeguards. 


Interviews with Zoom and Teams

Use Zoom or MS Teams to collect interview data, as the university provides support for their use and their settings are adjusted to meet the requirements of the university. Alternatively, you can use a university-secured device that you can inquire about from your unit or Digital Services.

Always use JYU Zoom (jyufi.zoom.us) or, alternatively, a university's device (recorder, telephone, or laptop) when collecting personal information and other sensitive or confidential information from specific personal information groups during an interview. Instructions on how to use Zoom to securely record, transfer, and store interviews can be found on the intranet Uno . When using university equipment, make sure that the recorder encrypts the content to be recorded, encrypts the phone memory card separately from the phone settings, and that the storage space of the laptop you are using is encrypted. 

You can collect interview data containing so-called ordinary, non-sensitive personal information with MS Teams.

Interviews with smartphone

Features of your smartphone, such as data transfer to your service provider's cloud storage, compromise the security of your confidential content. Please note the following when recording interviews on your smartphone that contain personal information and / or confidential information:

  • Remove sync from the phone to the manufacturer's or Google's cloud services.
  • Transfer the recordings to a secure storage location (CollabRoom) and delete them from the phone.
  • To lock your phone, use a PIN code of at least 6 characters long, as instructed.
  • Set the screen timeout to 30 seconds.
  • Report the loss of the device promptly Digital Services Service Desk: palvelupiste@jyu.fi , phone 041 260 3600.
  • If the interviews contain sensitive personal data or the data is confidential in accordance with university guidelines, use a 14-character password to lock the phone.

Transcription

  • MS Teams embedded transcription tool can be utilized when collecting data with Teams (not suitable for special categories of personal data). 
  • Researchvideo.jyu.fi offers an automatic transcription tool for video and audio files uploaded in the service. 
  • NB! JYU O365 WordOnline transcription tool cannot be used for transcription due to the provider's new data security policy.

Documentation

Essential questions: How do I document my data so that they can be found, accessed, and used by me and others tomorrow, in a week, and years from now? If a completely unknown researcher found my data, could they understand what they are about? What do I need to do to understand and be able to use my data?

Documentation refers to the creation of descriptive information that clarifies the context and methods of capturing and processing the data, as well as the structure of the data, and the chosen file system (e.g. subfolders, naming, etc.) It can mean, for example, a description of variables and key vocabulary as well as units of measurement, or an inventory of research interviews and related basic information. The documentation may also contain information on, for example, the version of a particular dataset. Technical metadata produced by technical equipment (e.g. calibrations, etc.) are also essential documentation.

A large part of research data are in digital form to start with. Research data typically consist of

  • questionnaires and interviews
  • different types of measurement and observation
  • other types of video, image, audio, and text materials.

Basic datasets for research are collected e.g. in the form of questionnaires, interviews, video recordings, as well as with various devices and sensors. Different measurement and data collection methods yield different metadata and file formats. This poses challenges especially when investigating the same phenomenon with different observation methods and datasets. The processing of data (e.g. regarding data descriptions, data protection and data security) can be streamlined and automated by software designed for the purpose.

At the data analysis stage, the actual results of an empirical study are derived from the raw data. When raw data is processed, aggregated and analysed, it results in various further datasets for elaboration and reporting. To make the study progress smoothly, it is important that you handle the generated datasets in a controlled fashion.

When you document your data during collection and processing,

  • you will later understand what the data are about. You will be able to inventory and evaluate your data.
  • The data become independently comprehensible and thus reusable.
  • The results are reproducible from the raw data.
  • No one else will misuse or misinterpret your data.
  • You will ensure that your data are as FAIR as possible, and you are implementing good scientific practice.

Best practices

  • Establish uniform data handling procedures among your group that everyone follows, and document them.
  • Keep the documentation up to date as your data collection and processing proceeds. When planned well ahead and done at the same time as the you work with your data, it is a small effort, but at a later stage it becomes practically impossible.
  • Find out what kind of software is available for you for data collection, processing and analysis. University's Digital services provide software upon request for data processing.
  • If your discipline has a practice of using discipline-specific metadata standards for documentation, use it- is a good way to ensure the future findability and reusability of the data. Standards help you describe your data with standard attributes which make it easier for other researchers in the same field to make sense of them. The description can be saved in e.g. a Readme file in .txt or XML format and stored alongside the data.
  • Store the descriptive information in separate files (eg Readme files, inventory excels) alongside the datasets in a subfolder you name /DOCUMENTATION. This way, the documentation files can be found also by someone who does not know the structure of the data in more detail.
  • Plan what kind of documentation you produce and where to find it if you don't use the /DOCUMENTATION subfolder. If possible, use metadata standards for your industry in the documentation.
  • Agree within your research team members well in advance on a uniform way to arrange files in folders and subfolders. A logical folder structure streamlines work and reduces the risk of loss.
  • Use open file formats instead of commercial formats. Open, standard file formats are the best guarantee of data availability after several years. Examples of recommended and acceptable formats can be found in the UK Data Service format comparison table.
  • Describe your documentation practices in your DMP.


An example of a directory accompanied by a DOCUMENTATION subfolder (source: The Data Management Expert Guide by CESSDA ERIC): 

" "

Guidelines by discipline 


Metadata 

Metadata refers to general descriptive information about research data (e.g. owner, authors, distributor, name, short description, location…) Up-to-date metadata is the key to finding and accessing your data. Basic project-level metadata is created and maintained in the Research data section of Converis, the JYU research information system. For more information, see step-by-step guidelines in JYU Intranet Uno. When kept up-to-date during the course of the project, metadata clarify for both you and secondary users what the work is about.

At JYU, metadata is maintained in the Research Data System section of Converis. A metadata entry should be created for each dataset. When describing your data, divide them into such entities that you can describe them unambiguously. The data described in more detail can be bundled in Converis into larger entities under a larger “parent dataset”. See the detailed instructions for managing your metadata in Converis (in Intranet Uno).


Quality assurance

Ensuring the integrity and quality of the data as they are collected, migrated and transferred is an important part of data management. Careful documentation of the procedures of data collection is the primary measure to ensure the integrity and quality of the data. Depending on the type of material, equipment and methods, integrity and quality can be ensured, for example,

  • by calibrating measuring instruments to monitor the accuracy and scale of detection
  • by reviewing the spelled interview material with an external expert
  • using industry standardized methods, hardware, and software.

Pseudonymisation and anonymisation

In order to secure the safest possible handling of personal data and to follow what you have promised to your study subjects in the privacy notice, pseudonymisation and/or anonymisation of personal identifiers in the data may become topical. Anonymised data can be safely shared during the project and published at any point. However, anonymisation procedures have to be planned well in advance, and they require time and some effort.

What's the difference between the two?

Pseudonymous data are still personal data under the GDPR, but they can no longer be identified without combining them with other information. Replacing real names with fake names and identifiers with codes are typical pseudonymisation techniques. As long as the key to the coded identifiers is stored separately from the data in a secure storage location and only designated people have access to them, pseudonymisation may suffice as a data security method during the project. However, as the project ends, identifiers should be erased or, if storing them is necessary for e.g. enabling contact with the study subjects, the research group members should set a future date for re-evaluating the need for retaining the identifers. This should be documented in the data management plan.

Anonymisation means that all identified and/or identifiable information is irreversibly removed from the data, and no code to retrieve the identifiers remains. An individual can no longer be specified from the data, which means they are no longer personal data. In planning the life cycle of personal data in your research, familiarise yourself with what it takes to make your data anonymous, and whether anonymisation will impair their scientific value. Opting out of anonymisation and data publishing is a viable alternative for research that handles complex sets of personal data. For options, see guidelines for publishing the metadata.

When are my data pseudonymous and when anonymous?

  • Can any individual be recognised from audio, image, and video that you have taken? If yes, they contain personal information. 
  • Do you have the pseudonymisation code key still stored? If yes, the data are still pseudonymous. 
  • Even if the code key has been destroyed, are there still information in the data that combined could lead to identification of an individual person (e.g., municipality + school grade + gender + etc.)? If so, the data are likely to still be identifiable, which means that some of the variables should be aggregated or classified to blur the identifiability. '
  • Have you collected open-ended answers? If yes, could the respondent be identified from their text? If yes, the identifying bits have to be removed in order to make the data anonymous. 
  • If you collect qualitative writings or notes, could the respondent be identified from their text? If yes, the identifying bits have to be removed in order to make the data anonymous. 

Anonymisation guidelines and anonymisation plan model 

There are various anonymisation methods for qualitative and quantitative data. For detailed instructions for both, see the Finnish Social Science Data Archive Guidelines. The FSD also offers an anonymisation plan model. In case you plan to deposit your data in the Finnish Social Science Data Archive, the FSD experts will help you in anonymisation at the data deposition phase. See the FSD guidance for researchers.