The PMData Dataset

A lifelogging dataset of 12 persons during 3 months using Fitbit, Google Forms and PMSYS.
Also available as an OSF repository with
file browsing and as an OSF preprint.

 2020
 587MB
 18 categories/ person

The PMData dataset that aims to combine the traditional lifelogging with sports activity logging. Such a dataset enables the development of several interesting analysis applications, e.g., where additional sports data can be used to predict and analyze everyday developments like a person's weight and sleep patterns, and where traditional lifelog data can be used in a sports context to predict an athletes performance. In this respect, we have used the Fitbit Versa 2 smartwatch wristband, the PMSYS sports logging app, and Google forms for the data collection. PMData contains logging data of 12 persons from a period of 3 months.

Citation

@misc{thambawita2020pmdata, title={PMData: A sports logging dataset}, url={osf.io/k2apb}, DOI={10.31219/osf.io/k2apb}, publisher={OSF Preprints}, author={Thambawita, Vajira and Hicks, Steven and Borgli, Hanna and Pettersen, Svein A and Johansen, Dag and Johansen, Håvard and Kupka, Tomas and Stensland, Håkon K and Jha, Debesh and Grønli, Tor-Morten and et al.}, year={2020}, month={Feb}}

Fitbit

The data from the Fitbit Versa 2 smartwatch has been extracted into csv- and json files, and the fitbit directory contains the following files:
calories-YYYY-MM-DD.json shows how many calories the per-son have burned the last minute.
distance-YYYY-MM-DD.json gives the distance moved per minute. Distance is given in centimeters.
exercise-xxx.json describes each activity in more detail. It contains the date with start and stop time, time in different activity levels, type of activity and various performance metrics depending on the type of exercise, such as, for running, it contains distance, time, steps, calories, speed, and pace. The xxx in the filename contains the number of activities logged. Each file contains a maximum of 100 activities.
heart_rate-YYYY-MM-DD.json shows the number of heartbeatsper minute (bpm) at a given time.
sedentary_minutes-YYYY-MM-DD.json sums up the number of sedentary minutes per day.
lightly_active_minutes-YYYY-MM-DD.json sums up the number of lightly active minutes per day.
moderately_active_minutes-YYYY-MM-DD.json sums up the number of moderately active minutes per day.
very_active_minutes-YYYY-MM-DD.json sums up the number of very active minutes per day.
resting_heart_rate-YYYY-MM-DD.json gives the resting heart rate per day.
sleep_score.csv helps understand the sleep each night so you can see trends in the sleep patterns. It contains an overall 0-100 score made up of composition, revitalization and duration scores, the number of deep sleep minutes, the resting heart rate and a restlessness score.
sleep-YYYY-MM-DD.json is a per sleep breakdown of the sleep into periods of light, deep, REM sleeps and time awake.
steps-YYYY-MM-DD.json displays the number of steps per minute.
time_in_heart_rate_zones-YYYY-MM-DD.json gives the number of minutes in different heart rate zones. Using the common formula of 220 minus your age to find the max heartrate, Fitbit will calculate your maximum heart rate and then create three target heart rate zones—fat burn (50 to 69 percent of your max heart rate), cardio (70 to 84 percent of your max heart rate), and peak (85 to 100 percent of your max heart rate).

The YYYY-MM-DD in the file names means that there is a time component in the name as well, including the year (YYYY), month (MM) and potentially day (DD). As can be observed, there are a lot of various parameters included. For example, in total, there are 1,484 activity sessions (manual and 15-min-auto reports), 11,425,966 heart rate measurements and 1,028 days of sleep scores included. It can, of course, be discussed how accurate data from a smartwatch can be. For example, we have observations that indicate that the Versa step-counter is influenced by other activities than walking or running and that the estimated distances are slightly inaccurate. For the heart rates, the watch seems to be surprisingly accurate when we performed small comparisons using several devices at the same time. Thus, the Fitbit Versa 2 is not the best watch on the market and the absolute values might be off. However, the collected data should give reasonable indications of activities, and the relative differences between logs at least show if there have been positive or negative changes.

PMSYS

In terms of subjective reporting, there are three CSV-files with corresponding info-files to explain the various fields:
srpe.csv contains a training session’s end-time, type of activity, the perceived exertion, and the duration in the number of minutes. This is, for example, used to calculate the session’s training load or sRPE (RPE×duration).
wellness.csv includes parameters like time and date, fatigue, mood, readiness, sleep duration (number of hours), sleep quality, soreness (and soreness area), and stress. Fatigue, sleep quality, soreness, stress, and mood all have a 1-5 scale. Score 3 is normal, and 1-2 are scores below normal, and 4-5 are scores above normal. Sleep length is just a measure of how long the sleep was in hours, and readiness (scale 0-10) is an overall subjective measure of how ready you are to exercise, i.e., 0 means not ready at all, and 10 indicates that you cannot feel any better and are ready for anything!
injury.csv shows injuries with a time and date and corresponding injury locations and a minor and major severity.

Discussions in many fora is about the accuracy of subjective reports, as one is completely dependent on the truthfulness of the reporter. However, sport is not only a physical activity, and an athlete’s psychological "state-of-mind" may greatly influence the performance. Thus, if reported correctly, the subjective information may be of huge value, and there may be important information to be found and predicted. In total, there are 167 training sessions, 1,090 wellness reports, and 488 injury reports submitted.

Google Forms

The googledocs directory contains two files, i.e., info_reporting.txt describing the columns in the reporting.csv file. The latter then contains one line per report including a timestamp of the report submission time, an unused index, the date reported for, the eaten meals (breakfast, lunch, dinner and evening meal), the participants weigh this day, the number of glasses drunk, and whether one has consumed alcohol.

In total, there are 975 reports, varying from 44 to 112 per participant. Moreover, as for the PMSYS data, these are also subjective, some reports are missing, etc. Nevertheless, the submitted data gives good indications of consumed food and drinks, which again can give an important insight of calorie intake, which again, together with the activity, can give indications of weight loss or gain.

Food Images

Participants 1, 3, and 5 have taken pictures of everything they have eaten except water for 1 month (February). There are 278 images included in the food-images.zip file, and information about day and time is given in the image header. The participants used their mobile cameras to collect the images (iPhone 6s, iPhone X and iPhone XS). The standard export function of the MacOS Photos software with full quality was used to export the images.

Terms of use

PMData is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source.  This means that in all documents and papers that use or refer to the PMData dataset or report experimental results based on the dataset, a reference to the related article needs to be added: PREPRINT: https://osf.io/k2apb/. Additionally, one should provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Ethics approval

Before collection, each participant has signed a form allowing us to collect and publish the data related to this project.

Contact

Email michael (_at_) simula (_dot_) no if you have any questions about the dataset and our research activities. We always welcome collaboration and joint research!