Quality and coherence of data

Our web page has moved to a new address. This web page is no longer updated and it may contain outdated information.

In the data management plan, you will answer the following questions at this point:

1) Tell us briefly about your research data: what kind of data do you collect or produce, or what kind of existing data do you use?

2) How do you ensure the consistency and quality of your data? In other words, how do you ensure that the content of the data does not change during processing?

The section is related to the FAIR principles of Findable, Accessible and Re-usable.

Start your data management plan

  • by describing your data
  • reflecting on good practices to ensure quality and coherence of data.

Describe your data

What kind of data do you produce or use?

  • Do you do interviews, observations or surveys?
  • Do you produce measurement results or code, do you analyze works of art?
  • Do you use previously collected data? If so, tell us where it comes from and how you get it.

Raw data is often used to produce new data, e.g. transcriptions, spreadsheets, charts, visualizations, classifications or databases.

If necessary, also consider how much space is required to store the data (gigabytes or physical space, an estimate is sufficient). Usually, the space already on the U drive is sufficient.

Consider whether there is a need for special software or tools for collection, analysis or handling the data. Reserve enough time to familiarize yourself with necessary software or tools.

Ensuring the quality and coherence of data

How do you ensure the coherence of your data? This is about considering

  • Are there risks that could jeopardize the reliability and quality of the content of the data?
  • How do you ensure that the data does not inadvertently change and that the data remains error-free throughout its life cycle?

Most importantly, back up your data regularly so you can revert to a previous version if something goes wrong.

In order to ensure the quality and coherence of the data, it is essential to consider what may go wrong when processing the data and how these risks could be avoided. An example of a risk could be that when an interview recording is transcribed, the transcriber accidentally skips over a passage and a piece of the transcript is missing from the interview.

Tips for ensuring quality and consistency

  • Take a copy of the raw data or initial situation and work with the copy if possible.
  • Check that the original data content is preserved when data is exported from one system, format or location to another.
    • E.g. from a survey program to a U-station or from an interview recording to a transcript.
  • Keeping backups of different versions
    • It's a good idea to keep backups of different versions so that you can revert to a previous version if something has gone wrong if necessary.
  • The transcriptions of recorded and/or filmed data are checked, for example, by someone else working on the same project.
    • Note: if the data contains personal data (such as speaking voice), the data cannot be checked by any third party. In master's theses done as pair work, reviewing transcriptions together is a good practice.
  • Ensure that the interview framework and questions are as similar as possible for all interviewees.
  • Check the calibrations of the measuring instruments.
  • Checksums are used if the software offers one.
  • Ensure that the digitised data corresponds sufficiently accurately to the original physical or analogue data.

Source

Hint! See examples of DMP Tuuli's public data management plans. Please note that public plans do not follow the same framework and that there may be omissions or inaccuracies in their content.