Quantitative longitudinal study, case

This case example is based on the documentation of a longitudinal research project carried out at the Department of Educational Sciences, University of Jyväskylä. Courtesy of Senior Lecturer Kenneth Eklund.

1. Directory tree

Create a folder directory in the project's Nextcloud folder or S: network drive project folder according to the research setting, used scales, and measurement tasks. Create a separate subfolder for detailed contextual notes and e.g. administrational documents such as data privacy papers, consent forms, and contracts. " "

  • Name and organize folders, files, and variables using a consistent logic.
  • Record the naming conventions and the logic for organising the files in a separate Readme guide file (see 3.).
  • At this point, decide in which entities your data will be stored. In this example, each task or measurement is stored in its own file because this way, it becomes easy to 1) link the task descriptions and task forms (see below) and 2) to easily compile the data files for each researcher according to their needs.
  • Write a free-text, text-based method description (.docx) in English for each scale. Write the description with the same precision as in your future research article. This will allow you to extract the description for the article with only light modifications. At the same time, you avoid the risk of different researchers writing their own, differing descriptions of the same task.
  • Include in the description the reference of the used test, measurement details, and procedures. The purpose of the description is to make the measurement reproducible. The description should also include precise information on how the variable used in the analyses has been formed, for example, with SPSS syntax. In this way, the procedures of the SPSS analysis are also documented outside the SPSS software, and the research becomes more transparent. The description is stored in a clearly named documentation file in the data directory.
  • In addition to the method description, record a so-called task form, if used in the study.  In the task form, you can easily check the operating instructions given to the study subject and the content of the individual sections.
  • SPSS tip: When composing SPSS syntax, use the software's commenting tool to simultaneously create embedded descriptions of how the average variables are calculated. Importantly, save the calculatio syntax for the mean variable in the variable label. Otherwise, valuable information about the formation of variables remains undocumented! This results in cumbersome multiple work, and incomplete labeling reduces the re-use value of the data.

2. User interface

"I'm using the data file X in this longitudinal dataset. Where can I quickly find a description of that data so I can use it in my article?"

A separate researcher's user interface in tabular format for the data makes the work streamlined and efficient. It makes the data far more accessible. When a researcher becomes acquainted with the data from the perspective of their own research subject, they quickly find the scales they need, and access them with a single click. If the researcher uses the directory, they should be able to understand the structure of the entire folder and file tree. In Excel, one glance suffices:

" "

Checklist for creating a directory and interface

  • Once the research plan has been approved and the main objectives and scales are clear, it is time to design a preliminary project-specific directory frame and tabular interface tabs, as well as its first two columns (scales and dates/forms) for the data, and documents to be collected. At this initial stage, mere titles of the planned tasks will suffice. Links to scale descriptions and forms are added as they are finalised.
  • The directory framework helps to ensure that all the necessary information is collected and that all the documentation generated in the project is gathered in a coherent order in one place.
  • Record the details of the scales accurately. Link the finished documents to the Excel interface as soon as the details are known. This avoids the risk of forgetting important details and ensures the transparency and reproducibility of the study.
  • The next three columns of the Excel interface, data, variables and distributions, and reliability, are directly related to the collected data. They are filled in when the data starts to accumulate. However, it is not advisable to expect the entire data set to be complete;  the structure of the data files and the naming logic of the variables should be agreed on between the researchers after the first measurements. This ensures that the variables are named in a uniform way. If each researcher uses their own logic, cumbersome corrections have to be made afterwards. For example, function- and scale-specific abbreviations for variable names (see Figure 2) can be agreed on before data collection begins.
  • The pseudonymous base file used when creating other SPSS data files should be carefully checked for data before being deployed among study group members. When data are in one place and researchers do not download them to their own devices, maximum data security is guaranteed, work becomes more efficient, and researchers avoid errors that easily occur when entering data.
  • Importantly, name at an early stage preferably one responsible person ( a data manager) and a back-up person who will manage the compilation of the data  and documentation as agreed by the research team. Only these people should have editing rights to all folders and files. Read-only rights often suffice for many members of the research team. In addition, make sure to agree on the principles for making data available to external partners. It is recommended that external partners are not given rights to the data, but that the data manager compiles the data files they need for them. This maximizes the data protection of the study subjects and minimizes the risk of data clutter, as no one can mistakenly save and share an incorrect version of a data file.
  • When individual researchers or research assistants save files during the data collection, it is best practice that the data manager saves them to the final location in the data directory at the end of the data collection. At the same time, the data manager can check the file, for example, for duplicates and possible invalid values. At this phase, the data manager links it to the Excel interface for everyone to use.
  • In addition, agree within you research group on clear rules for e.g. data licensing and publication, transfer, and the obligation for researchers to hand over the syntax of the sum variables they use to the data manager for later use by other researchers.
  • The UK data Service maintains a listing of recommended FAIR-friendly file formats

3. A Readme guide

Create a separate a Readme.docx document describing and explaining the documentation described above will be prepared at the root of the network drive, and update it as the study progresses. The Readme serves as a start-here guide and a guide for making sense of the organisation principles and contents of the data directory. When a new researcher joins the group, for example, they get a quick overview of the data from the Readme. Title the description clearly, e.g., "Principles used in structuring and organizing data in the X project". The document contains a description of the structure and elements of the data tree and the user interface:

1. Basic description of the material
2. Terms of use and permissions
3. Location of the data and description of the organization of the directory
4. Description of naming conventions for files and variables
5. Excel user interface description and explanations.

Example: Explanations for the columns of the Excel user interface in tabular form

Column n:o

Content

Link

1

Name of the task

Detailed description of the task that can be used in scientific articles, in method sections

2

Assessment time

Forms of the task including task instructions used and specific items used in the task

3

Name of the SPSS data file

Data files in SPSS (.sav).

4

Names of the composite scores that is recommended to use when analysing the data

Syntax files of the SPSS (.sps), where can be seen, how the composite scores were calculated from the raw variables

5

Descriptions of distributions and reliabilities of the composite scores

Output files of the SPSS (.spv) including calculation of the composite scores, their reliabilities and distributions

6

Presentations and publications

If the task has been used in a presentation or publication, it is linked here