Documentation and metadata

In the data management plan, you describe at this point

1) Why document data management and record metadata?

2) Explain how you document your data and how it is processed. Use examples to provide the metadata of your data to the extent you can. If you have not yet received the data, tell us about it on a general level.

The section relates to all FAIR principles, i.e. Findable, Accessible, Interoperable and Re-usable.

An essential part of data management is to take care of documentation, i.e. keep track of what you are doing. This can be done, for example, by writing a research diary or by using an excel spreadsheet in which the stages of data processing are recorded. Thus, documentation is a process that makes the data understandable and usable.

And what is metadata?

  • Imagine your data as a closed package, the contents of which you do not know. Metadata is like a sticker on top of a package that tells you the contents of the package.

In research, metadata refers to basic data describing data that has been compiled in a human- and machine-readable format and that enables the identification and findability of the data if the data is published online.

In the thesis process, metadata is usually not published, so it can be thought of as a table of contents for the data, which you can create to keep your files and folders organized. For example, you might have two collections of photos. What basic information do you need to list about them in order to differentiate them and be able to work smoothly?

Documentation and metadata help ensure that your work is smooth and organized.


Scientific knowledge production requires that the research process is documented in such detail that it would be possible to repeat the research design afterwards in the same form. In this way, the reliability of the results can be verified.

  • In your research process from the very beginning, make sure that you are able to describe it accurately enough in your thesis.
  • The documentation requirement also applies to data – it must be possible to describe the creation, structure and processing of data in a form that is understandable to others.
  • For documentation related to the data, you can use, for example, a formal research diary or an excel spreadsheet that describes
    • the different parts of which the data consists of, and
    • key information about the components of the data (e.g. number of interviews, date of interview, specific themes, etc.)

In practice, you keep track of what you did, when, how, why, and with whom. Documentation helps you find, for example, among the interviews you have conducted, the one interview that talked about issue X. For example, writing down the explanations of the variables you use and taking notes on each stage of data editing and analysis are a key part of documentation.

In doing so, your work is more thoughtful, systematic and structured, i.e. you do better science. You make your own progress easier when you know what you are doing and how to find what you need in your data at any given time. In this way, you can also return to your data without problems or, if permission permits, make it available to other researchers so that it is understandable to them.

Different disciplines may have their own practices related to documentation. Chemistry students can look at an example of filling out a laboratory diary (in Finnish)

Metadata as part of the documentation process

Note that while you probably won't be able to accurately define all the metadata relevant to your research in the early stages of your research, design them with the precision possible. This way, you can describe your data logically during the work, and files and folders will not get confused.

  • Documentation includes recording and updating the metadata of the data, i.e. the basic description data.
  • In practice, you need descriptive and technical information to append your research data. These form the metadata (metadata) of your data. In other words, metadata means information about information, in this case descriptive information about your research data.
  • If your data is suitable for publication or archiving after the thesis has been completed, you will need metadata for this.

Depending on the type of dataset, metadata includes, for example, the name given to the dataset, the author, and when, where and how the data was collected. It is important how you name folders and files. For example, it's easy to identify the latest version of a file when you name the latest version consistently the same way.

Metadata ensures that you or the downstream user can find everything they need and interpret the data unambiguously, regardless of the moment and context of use. If you only had the data you had collected, but no explanatory information and there was a break in writing your master's thesis, would you remember what you were doing and what was in each file after a month? And if you handed over the data to the research group but had not explained the variables and abbreviations you used, would the research group even be able to understand your data? You can think of documentation and metadata as a kind of reading and user guide ("readme" file, e.g. "readme" file). Guide to writing "readme" style metadata).

What if you use ready-made data? In archived finished data, the archivist has taken care of the descriptions and you can find them in the archive's data catalogues. Please note, however, that your own tabulations, etc., made on the basis of the finished data require that you describe them yourself. In your own dataset, you can store both the research data and the files containing its descriptive data in the same place.

Remember to refine your plan throughout the research process, i.e. also after the course!

Examples of metadata

  • There are many types of metadata, and different metadata may be required in different situations.
  • If some information has already been disclosed in the research plan, it does not need to be disclosed again in the data management plan.
  • Keep in mind that while you probably won't be able to accurately define all the metadata relevant to your research in the early stages of your research, design it with the precision possible.

In the data management plan, describe or plan the following metadata, for example, where applicable:

1. General information

  • Name of the entire dataset – descriptive of the content
  • Names of researcher(s)
  • Possible other participants in the research process and their roles, such as transcriber
  • When the data was collected ( or time interval)
  • Where the data was collected, e.g. "samples were taken at place x" if essential information
  • Keywords, e.g. social workers, parliamentary elections, online services
  • How do you document the collection and processing of data? For example, a formal research diary or excel spreadsheet.

2. Files and folders

In the following sections, write down in your response the information about how and where you store the data. In your answer, tell us, for example, what logic you use to name folders and files (folder structure, identifying the latest file).

  • Name files and folders consistently the same way—describing the content
    • Short description text of the contents of a folder (for example, a text .txt document that is stored in a folder)
    • If necessary, description text about how files and folders are named (for example, a text .txt document that is saved in a folder)
  • Create a consistent folder structure and write down where the dataset is
    • Folder path e.g. U : Documents > Studying > Master's thesis > Data etc.
    • If the data or part of it is located away from the computer, write down where the data is located
    • If the data is very extensive, you may need an index that makes it easier to find the information you need at any given time.
  • When folders and files were created ( or time interval)
  • When different versions were created ( or time interval)
    • When making significant changes or different versions, describe the changes, e.g. in the file name, separate text document, or research journal

3. How the data is collected and how it is interpreted

  • How was the data collected? Tell us the methodology used.
    • You can attach or link forms, etc. (no need to put in the answer)
  • How will the data be used in your research?
  • Record the software or hardware that has been used to collect the data or that is needed to view the data
    • For example, for conducting interviews, a real tape recorder, not a phone, which often automatically saves content to the cloud.
    • Record equipment calibrations.
  • Variables and their definitions.
  • Explanation of codes, symbols and abbreviations.
  • If there are missing/incomplete parts of the data, how do you mark them?

4. Information related to sharing and access

  • If you use data produced by someone else, are there any licenses attached to the data, such as cc licences?
  • Licences can be used to define who has the right to use the data and what can be done with the data. CC licences are licenses that enable open use, which can be used to grant various permissions.
  • What are the restrictions on the use and distribution of the finished data you use and/or produce yourself? For example, personal data usually restricts.
  • Has the data already been published somewhere? How to access it?

More comprehensive information on metadata can be found in the Data Management Handbook.