A longitudinal dataset of multi-platform video game digital trace data and psychological assessments

Authors

Affiliations

Nick Ballou

University of Oxford

Tamás Andrei Földes

University of Oxford

Matti Vuorre

Tilburg University

University of Oxford

Thomas Hakman

University of Oxford

Kristoffer Magnusson

Karolinska Institute

University of Oxford

Andrew K Przybylski

University of Oxford

Abstract

A major limitation to understanding digital technology use, and its potential psychological consequences, is the lack of sufficiently detailed, multidimensional, and accurate data. We present a dataset of 2.0K individuals’ video game play telemetry data from Nintendo Switch, Steam, and Xbox, paired with psychological assessments across multiple dimensions of mental health, motivations, well-being, and cognitive ability. The data were collected over 12 weeks that included fourteen daily assessments, six biweekly surveys, three biweekly cognitive tests, and play telemetry for X months. The data include X hours of video game play across X titles, X responses to X survey instruments, and X attention ability measures to facilitate examining longitudinal associations between play behaviors and psychological functioning. Data and codebook are available under a CC0 license at https://example.com.

Keywords

dataset, video games, well-being, mental health, trace data, attention

Introduction

Digital trace data—behavioral logs automatically collected by digital devices and online platforms—are necessary to better understand technology use and its psychological and health effects (Burgess et al., 2024; Freelon, 2014; Griffioen et al., 2020). Self-reports of technology use do not accurately reflect objective technology use (Parry et al., 2021; Sewall & Parry, 2021), and are unsuitable for examining many phenomena of interest, such as seasonal patterns over long temporal horizons; high-frequency behavioral analysis at the level of hours, minutes, or even seconds; historical content analysis; and others due to limited accuracy and temporal resolution. Moreover, when technology use’s relations to psychological survey instruments are of interest, digital trace data removes the possibility of common methods bias.

To combat these issues, digital trace data is increasingly used in studies of smartphone and social media use (e.g. Sewall et al., 2022), but less so in studies of video game play—one of the world’s foremost leisure activities (Entertainment Software Association, 2024; Ofcom, 2023). Trace data in video games primarily falls under the field of game analytics or data science, but (1) the majority of these studies involve the use of proprietary data that is not publicly shared, and (2) the focus of this field tends to be more on player behavior metrics with clear industry value rather than academic inquiry for the common good.

Where trace data has been used in video games studies, it has largely been limited to a few games (Johannes et al., 2021; Larrieu et al., 2023; Perry et al., 2018; Vuorre et al., 2022) or single platform (Ballou, Sewall, et al., 2024; Ballou et al., 2025). However, players commonly play many games on multiple platforms. Moreover, the scope of psychological attributes in these studies tends to be limited to e.g. a small set of well-being outcomes.

Current dataset

Here, we present a longitudinal dataset of digital trace data across multiple gaming platforms consisting of 2.0K participants, 18.8K daily surveys, 7.0K surveys, and a total of 1.3M of gameplay distributed across 1.4M. Digital trace data was sourced for five distinct platforms—Xbox, Nintendo, Steam, iOS, and Android—through distinct piplines detailed below. The data collection methods were preregistered as part of a Stage 1 registered report (https://osf.io/pb5nu).

The dataset is openly available under a CC0 license at https://example.com for unrestricted reuse. Table 1 shows a high-level overview of the dataset.

Table 1: Overview of the dataset and sources.

Data type	Participants	Events	Hours	Avg events / participants	Avg active days
Intake survey	34,168	34K		1.0	1.0
Daily surveys	1,297	19K		14.5	14.5
Biweekly surveys	1,973	7K		3.6	3.6
Steam	1,514	203K	168.9K	134.4	33.6
Nintendo	568	168K	93.5K	296.2	101.8
Xbox	176	387K	480.3K	2.2K	603.1
iOS	95	4K	3.7K	38.2	33.8
Android	77	3K	3.2K	33.5	29.2

Method

Design

Figure 1: Participant flow during study intake.

The study consisted of four stages (Figure 1).

Stage 1: Screening

In the first stage, we screened participants in order to find people aged 18-40 who (1) self-report playing video games, (2) self-report that at least 50% of their total video game play takes place on the platforms included in the study, and (3) were willing to link their gaming accounts to provide digital trace data. We screened participants from two panel sources: PureProfile and Prolific.

Participants were recruited under an initial set of ethnicity‐based quotas designed to mirror the general population’s demographic composition. After we reached approximately 50% of our target sample under quota constraints and found that further quota‐eligible recruits were scarce, we suspended the quotas for the remainder of data collection; all subsequent participants were enrolled on a first‐come, first‐served basis. Final sample characteristic reflect both quota‐driven and open‐enrollment phases (see below).

Stage 2: Account Linking

Participants who were deemed eligible during screening proceeded directly to an account linking survey wherein they provided details of the gaming platforms they actively use. For UK participants, this includes Nintendo Switch, Steam, Android and iOS. For US participants, this includes the same four alongside Xbox. Details of how participants linked each type of account are shown in Table 2.

Table 2: Platform Details

Platform	Data Source	Account Linking Process	Type of Data Collected
^a See https://accounts.nintendo.com/qrcode.
^b In previous research, Nintendo-published games accounted for 65% of Switch playtime (Ballou et al., 2025).
^c See https://support.xbox.com/en-US/help/account-profile/manage-account/guide-to-insider-program.
Nintendo	Data-sharing agreements with Nintendo of America (US) and Nintendo of Europe (UK)	Participants share an identifier contained within a QR code on Nintendo web interface. Nintendo of America/Europe uses this identifier to retrieve gameplay data and share it with the research team.^a	Session records (what game was played, at what time, for how long) for 1st party games (games published in whole or in part by Nintendo, but not by third party publishers such as Electronic Arts).^b
Xbox (US only)	Data-sharing agreement with Microsoft	Participants consent to data sharing by opting in to the study on Xbox Insiders with their Xbox account. Microsoft retrieved gameplay data for all consented accounts, and shares it with the research team in pseudonymized form.^c	Session records (what game was played, at what time, for how long). The name of the game replaced with a random persistent identifier for all third-party games (i.e., those not published by Xbox Game Studios), but genre(s) and age ratings are shared.
Steam	Custom web app (Gameplay.Science)	Participants sign up for Gameplay.Science (https://gameplay.science), an o pen-source platform for tracking Steam gameplay. Participants consent to have their gameplay data monitored for the duration of the study. Their Steam ID is authenticated using the official Steam authentication API (OpenID).	Incremental playtime per game (every hour, the total time spent playing during the previous hour)
iOS	iOS Screen Time Screenshots	In each biweekly survey, participants submitted screenshots from the built-in iOS Screen Time app. These show details of the previous 3 weeks' of gaming app use (what games were played and for how long). Data was extracted using OCR.	Total weekly playtime per game (e.g., 2 hours on game X, 5 hours on game Y)
Android	Digital Wellbeing Screenshots	In each biweekly survey, participants submitted screenshots from the Digital Screen Time app, if available on their Android OS. These show details of the previous 3 weeks' of phone use (what app categories are used and for how long). Data was extracted using OCR.	Total weekly playtime per game (e.g., 2 hours on game X, 5 hours on game Y)

Stage 3: Account Validation

After players completed the account linking process, we checked each account for evidence of valid gaming—specifically, records of active gameplay sessions on one or more of Steam, Xbox, and Nintendo within the 2 weeks before survey completion. Participants who did not have recent, valid telemetry on any console platform were excluded from the rest of the study.

Stage 4: Surveys

Figure 2: Survey administration schedule across the 12-week study period. Participants completed biweekly surveys (orange) every two weeks, with US participants additionally completing daily surveys (blue) for the first 30 days. Cognitive tests (green) were administered during biweekly surveys at weeks 1, 5, and 9. Gray circles indicate days with no scheduled surveys. Retention percentages show the proportion of baseline participants (N=1,980) who were still active at each measurement week (defined as having completed either a daily diary or biweekly survey at any time after that week).

Eligible participants were invited to complete 6 waves of biweekly surveys, one every two weeks (Figure 2). US participants were additionally invited to complete daily surveys for 30 days, concurrently with the first biweekly surveys. During waves 1, 3, and 5, a cognitive task was also administered within the biweekly survey.

Daily survey links were sent every day at 2pm local time for the participant and remained available until 3am. Biweekly survey links were sent every second week from the first day of the study at 12pm and remained available for 96 hours.

Participants

Table 3: Sample demographics for qualified participants with general-population benchmarks (where available).

Demographic characteristics of qualified participants alongside general population benchmarks (where available). See supplementary materials for sources of general popoulation estimates.
Variable	Level	US Sample %	US Gen Pop %	UK Sample %	UK Gen Pop %
Gender	Man	64	49	71	49.6
Gender	Woman	28.8	48.9	24	49.5
Gender	Other gender identity	7.2	2.1	5	0.9
Ethnicity	White	63.5	75.5	84.8	81.7
Ethnicity	Asian	8.8	6.3	7.9	9.3
Ethnicity	Two or More Races	13.9	3	4.5	2.9
Ethnicity	Black	9.2	13.6	1.9	4
Ethnicity	Other	3.8	0	0.8	2.1
Ethnicity	American Indian and Alaska Native	0.6	1.3	NA	0
Ethnicity	Native Hawaiian and Other Pacific Islander	0.2	0.3	NA	0
Education	Completed secondary or less	59.8	61	48.7	60.1
Education	Bachelor's degree	32.8	24	35.3	25.5
Education	Postgraduate	7.4	15	16	14.4

Our final sample consists of 2102 qualified participants, selected from a pool of 34295 screened participants. Of the 2102 with recent telemetry, 1973 also completed at least one survey. On average, participants were. Due to errors in screening, four participants over the age of 40 were included.

Table 3 shows demographic characteristics of the final sample alongside general population benchmarks. Our participants are more likely to be male, non-binary, and bi- or multiracial than the general population. Although the demographics of the population of people who play video games are less well understood, our sample’s demographics are broadly consistent with previously reported data (e.g., Entertainment Software Association, 2024).

Besides those appearing in Table 3, we collected various other demographic variables from eligible participants at intake, including employment status, height and weight, self-identified and diagnosed neurodivergence (e.g., ASD, ADHD, dyslexia), political party affiliation, marital status, caretaking responsibilities, and postal geography (general area only; first three digits of the five-digit US ZIP Code; UK outward code). Further details of these are available in the online codebook.

Ethics and Compensation

This study received ethical approval from the Social Sciences and Humanities Inter-Divisional Research Ethics Committee at the University of Oxford (OII_CIA_23_107). All participants provided informed consent at the start of the study, including consent to their data being shared openly for reanalysis.

Prolific participants were paid at a rate of £12/hour for all study components, which equates to: £0.20 for a 1-minute screening, £2 for the 10-minute intake survey (plus £5 for linking at least one account with recent data), £0.80 for each 4-minute daily survey, and £2 for each 10-minute biweekly survey. Participants received £10 bonus payments for completing at least 24 out of 30 daily surveys and/or 5 out of 6 biweekly surveys.

Dataset

Digital Trace Data

As described above, we collected video game play data from five platforms: Xbox, Nintendo Switch, Steam, iOS, and Android (full details in Table 2). To recap, on Xbox and Nintendo, we have session-level data, characterized by the following fields: a game ID (Xbox) or title (Nintendo), a start and end time, and genre(s). On Steam we have hourly aggregates - every hour, how much time people spent playing for all games they played in that hour. On iOS and Android, we have daily aggregates - every day, how much time people spent playing each game. We describe each platform in more detail below. For concision, we do not repeat the details of Table 2 in here, but direct readers to that table or our supplementary materials for the exact variables in each platform’s trace data.

All telemetry timestamps are stored as UTC, but can be converted to the participant’s local time using the local_timezone variable.

In Figure 3, we visualize the distribution of play across days and times. As expected, we find that the likelihood of play peaks on weekends from 8-11pm, and is lowest in the early morning.

Figure 3: Diurnal play across Xbox, Nintendo, Steam.

Self-reported Gaming

We also collected self-reported gaming data in each biweekly survey. Participants estimated the time they spent playing games on platforms they had linked during the study (e.g., excluding other platforms such as Playstation) in each of the following periods: last 24 hours, last 7 days, and last 14 days. In addition, participants reported details of at least 1 and up to 3 of their most recent gaming sessions (game, date, and start/end time).

Figure 4 compares the average self-reported distribution of play across platforms to the distribution in our digital trace data capture. It is vital to note that the self-report data not be treated as ground truth: we have good evidence that people’s self-reports of media use are inaccurate (Kahn et al., 2014; Parry et al., 2021), with some previous work finding systemic overestimation of video game play (Johannes et al., 2021). Nonetheless, it is likely that some portion of players’ true gaming is systematically uncaptured, due to factors such as missing titles (e.g., Nintendo third party), or player privacy settings (e.g., playing in invisible mode; setting certain games to private on Steam). The figure therefore provides a useful overview of the relative coverage of different platforms in our telemetry data.

Figure 4: Average weekly playtime (minutes) by platform, self-reported vs telemetry (among users with telemetry). Lines connect estimates for the same platform.

Survey measures

We collected a variety of self-report measures at different time scales. We briefly describe which constructs we collected here; for further details of the specific measures and example items, see Table 6.

Trait / traitlike (baseline). We assessed chronotype, Big 5 personality, player trait typology, and gaming disorder symptoms at baseline. Gaming disorder symptoms were measured twice, at biweekly waves 1 and 6.

Daily. Daily surveys captured: basic psychological need satisfaction and frustration in life in general; basic psychological need satisfaction and frustration in the context of video games; life satisfaction; affective valence; sleep quality; stressors; types of social gaming engaged; and self-reported displacement.

Biweekly. Every two weeks we assessed: general mental wellbeing; depression symptoms; life satisfaction; basic psychological need satisfaction and frustration – video games; and subjective displacement.

Monthly (alternating biweekly). On alternating biweekly surveys (i.e., monthly), we assessed: sleep quality; daytime sleepiness; and perceived harms and benefits of gaming.

Figure 5 illustrates the temporal dynamics of gaming behavior and mental wellbeing through three representative case studies.

Figure 5: Sample of daily gaming patterns and mental wellbeing for three representative participants. Stacked bars represent total daily playtime across platforms (Nintendo in red, Steam in dark blue, Xbox in green, iOS in grey, Android in light green). Orange line shows biweekly mental wellbeing scores (WEMWBS) measured at six study waves. Participants were selected from those closest to the 25th, 50th, and 75th percentiles of total playtime, prioritizing those with the most varied multi-platform gaming behavior.

Attention Control

In study waves 1, 3, and 5 we measured participants’ attention control using the Simon Squared task of Burgoyne et al. (2023), using modified code from Liceralde & Burgoyne (2023). Although the original Squared tasks consist of the Simon, Stroop, and Flanker Squared, due to limited participation time we chose to only use the Simon Squared task as it had the greatest factor loading on attention control in the original study (Burgoyne et al., 2023).

The Simon Squared task is a short and validated measure of attention control that follows the standard Simon task (Simon & Rudell, 1967) but is completed in about three minutes. Participants see a target arrow pointing either left or right, with response labels “LEFT” and “RIGHT” printed underneath. Participants then must select the response option (e.g. “LEFT”) that matches the arrow’s direction (e.g. ◀︎). However, the arrow and response options can appear on either side of the screen, and participants must ignore this spatial configuration and attend only to the symbols’ meanings.

After reading the instructions, participants practice for 30 seconds with auditory and text feedback for response accuracy. They then see their score from the practice trials, review the instructions again, and are given 90 seconds to gain as many points as possible. Participants gain one point for each correct response, and lose one point for each incorrect response. After the 90 seconds, the number of correct responses minus the number of incorrect responses is the participant’s task score. For a complete task description, see Burgoyne et al. (2023) and Figure 6 therein.

Overall, 1077, 991, and 800 participants completed the Simon task at panel waves 1, 3, and 5. The average performances were approximately 10 points lower than in the in-person study of Burgoyne et al. (2023) (Figure 16). Participants on mobile devices generally attained lower scores than those not on mobile devices (Figure 6, rows), but performance was generally stable across the three waves (Figure 6, columns).

Figure 6: Histograms of participants’ Simon Squared task scores across study waves. Points and bars indicate means ±1 standard deviation.

Time Use

TODO

Data Quality Checks

We implemented a variety of data quality checks.

In each daily and biweekly survey, one item from the BANGS (daily) and BPNSFS (biweekly) was duplicated to assess response consistency (Meade & Craig, 2012); participants whose responses to the two identical items differed by more than one scale point were flagged for potential careless responding.
In the telemetry, we use several heuristics to identify potential unreliable sessions: sessions beginning or ending in the future (indicative of clock manipulation or other errors), sessions longer than 12 hours long, 3 or more games being played simultaneously [TODO: other heuristics]
In the time use, we flag any cases with fewer than 5 distinct activities logged in a day, or that are missing any

Missingness

As with most longitudinal studies, attrition and missing data present important challenges for data quality and statistical inference. Table 4 presents a comprehensive overview of missingness patterns across all data sources in our study, broken down by region (US and UK) and data type (Survey, Telemetry, and Cognitive Task).

The unit of observation differs across data types. For surveys and cognitive tasks, the unit is a completed survey or task. For telemetry, the unit varies by platform: Xbox and Nintendo use binary account linking, with data coverage considered maximal once linked (data provided directly by platform holders); Steam is measured hourly, with observations representing whether the participant’s profile was publicly visible in each hour; iOS and Android are measured daily, with observations representing whether a valid screenshot was submitted covering that day’s gaming.

Table 4: Missingness patterns across survey, telemetry, and cognitive task data by region. The table shows the number of participants (N), total expected observations, observations actually collected, missing observations, median number of missing observations per participant, and percentage of data completeness for each measure. Retention plots (right columns) visualize the proportion of participants remaining active over time: for surveys and tasks, retention is calculated across study days or waves; for telemetry, across the 84-day study period. Density plots show the distribution of observations per participant.

Region	Data Type	Measure	N	Expected	Observed	Median Missing
US	Survey	Daily Diary	1,351	40,530	18,829	21
		Biweekly Panel	1,351	8,106	4,211	4
	Telemetry	Xbox	411	411	175	—
		Nintendo	518	518	368	—
		Steam	998	2,011,968	1,755,391	0
		iOS	417	35,028	2,469	84
		Android	301	25,284	1,712	84
	Task	Simon Task	1,351	4,053	1,721	2
UK	Survey	Biweekly Panel	719	4,314	2,734	2
	Telemetry	Xbox	103	103	0	—
		Nintendo	213	213	196	—
		Steam	659	1,328,544	1,283,904	0
		iOS	117	9,828	940	84
		Android	167	14,028	759	84
	Task	Simon Task	719	2,157	1,147	1

Discussion

We believe this dataset has potential to address a wide variety of common research questions in the field.

Some of these questions will be addressed in forthcoming registered reports: specifically, we have plans to test (1) key hypotheses from the Basic Needs in Games model (Ballou & Deterding, 2024) about how gaming relates to basic psychological needs over time, (2) the relationship between late-night gaming and sleep, and (3) the relationship between playtime in different genres and wellbeing.

Nonetheless, the richness of this data means that researchers can explore numerous other questions (or, indeed, conduct and compare alternative analysis approaches to the above questions). To stimulate ideas, we present a few questions we think the data are well-suited to answering.

How do seasons and weather impact playtime? Because we capture time-stamped play sessions alongside participants’ geographic locations, researchers can merge in high‐resolution weather and daylight data to examine how environmental factors causally influence gaming behavior. Causal inference techniques such as inverse probability weighting can enable precise estimates of how, when, and how much people play in response to seasonal and meteorological changes. By quantifying these effects, researchers can better distinguish weather‐related demand from other drivers (like work schedules or weekend routines), improving the precision of studies on gaming’s impact on wellbeing, motivation, and cognition.

How do neurotypical and neurodiverse players differ in their gaming behavior? Using the neurodivergence data we collected (which includes, for example, 360 participants who identify as having autism and 498 who identify as having ADHD), researchers can. Neurodiversity in games has regularly been studied in the context of specific games and with qualitative methods

Accuracy of self-reported data - inference from other papers

We encourage researchers from a wide range of disciplines to explore these or other questions using the data we present, which is freely available for reuse under a CC0 license.

Limitations

While this dataset represents a substantial step forward in holistic coverage of video game play, it remains imperfect: we did not capture data on PlayStation (~19% of gaming market) or computer games played outside the Steam platform (~11% of gaming market); on Nintendo, we do not have access to third-party titles (42% of Nintendo play), and our coverage of smartphone play is limited by the difficulties and inconsistencies of screenshot-based donation and OCR retrieval.

We further are unable to identify idle time (when players have a game open but are not actively playing it) and account sharing (when players let friends or family use their account); some playtime values may therefore be overestimates of the person’s true playtime, though we are unable to say by how much.

Future Work

The trace data presented here is broad in scope but limited in granularity: we capture all gaming activity on a given platform, but not what happens within individual games. Prior work and theory make clear that in-game behaviors (e.g., what role a player adopts, whether they compete or cooperate, or how they perform in competitive modes) are critical determinants of player experience and thereby wellbeing (see e.g., Elson et al. (2014) for a review of how in-game contexts shape effects). This highlights a fundamental trade-off in digital trace research between breadth—how comprehensively play can be captured across platforms—and depth—the granularity of in-game behaviors and experiences. At present, our dataset emphasizes breadth, but we see strong potential in future study designs that combine platform-level telemetry with targeted in-game behavioral data to provide a more complete picture.

We also see strong potential in combining digital trace data with experimental designs that enable stronger causal inference—for example, randomizing players to single-player games only, or restricting play to certain times of day, to examine effects on social wellbeing or sleep. Previous researchers have noted a dearth of digital trace data-backed field experiments, while highlighting their potential (Stier et al., 2020): Trace data not only captures naturalistic gaming behavior but also allows researchers to assess substitution (what games or platforms participants switch to under intervention) and adherence (how closely they follow assigned play patterns).

Data Availability

All data, materials, and code related to the dataset and this manuscript are available under CC0 at https://example.com.

References

Almeida, D. M., Wethington, E., & Kessler, R. C. (2002). The Daily Inventory of Stressful Events: An Interview-Based Approach for Measuring Daily Stressors. Assessment, 9(1), 41–55. https://doi.org/10.1177/1073191102091006

Ballou, N., Denisova, A., Ryan, R., Rigby, C. S., & Deterding, S. (2024). The basic needs in games scale (BANGS): A new tool for investigating positive and negative video game experiences. International Journal of Human-Computer Studies, 188, 103289. https://doi.org/10.1016/j.ijhcs.2024.103289

Ballou, N., & Deterding, S. (2024). The Basic Needs in Games Model of Video Game Play and Mental Health. Interacting with Computers, iwae042. https://doi.org/10.1093/iwc/iwae042

Ballou, N., Sewall, C. J. R., Ratcliffe, J., Zendle, D., Tokarchuk, L., & Deterding, S. (2024). Registered report evidence suggests no relationship between objectively-tracked video game playtime and wellbeing over 3 months. Technology, Mind, and Behavior, 5(1), 1–15. https://doi.org/10.1037/tmb0000124

Ballou, N., Vuorre, M., Hakman, T., & Przybylski, A. K. (2025). Perceived value of video games, but not hours played, predicts mental well-being in casual adult nintendo players. Royal Society Open Science, 12, 241174. https://doi.org/https://doi.org/10.1098/rsos.241174

Burgess, R., Dolan, E., Poon, N., Jenneson, V., Pontin, F., Sivill, T., Morris, M., & Skatova, A. (2024). The Potential of Digital Footprint Data for Health & Wellbeing Research. Pre-published. https://doi.org/10.31234/osf.io/9jgn2

Burgoyne, A. P., Tsukahara, J. S., Mashburn, C. A., Pak, R., & Engle, R. W. (2023). Nature and measurement of attention control. Journal of Experimental Psychology: General, 152(8), 2369–2402. https://doi.org/10.1037/xge0001408

Buysse, D. J., Reynolds, C. F., Monk, T. H., Berman, S. R., & Kupfer, D. J. (1989). The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research. Psychiatry Research, 28(2), 193–213. https://doi.org/10.1016/0165-1781(89)90047-4

Cantril, H. (1965). The pattern of human concerns. Rutgers University Press.

Carney, C. E., Buysse, D. J., Ancoli-Israel, S., Edinger, J. D., Krystal, A. D., Lichstein, K. L., & Morin, C. M. (2012). The consensus sleep diary: Standardizing prospective sleep self-monitoring. Sleep, 35(2), 287–302. https://doi.org/10.5665/sleep.1642

Chen, B., Vansteenkiste, M., Beyers, W., Boone, L., Deci, E. L., Van der Kaap-Deeder, J., Duriez, B., Lens, W., Matos, L., Mouratidis, A., Ryan, R. M., Sheldon, K. M., Soenens, B., Van Petegem, S., & Verstuyf, J. (2015). Basic psychological need satisfaction, need frustration, and need strength across four cultures. Motivation and Emotion, 39(2), 216–236. https://doi.org/10.1007/s11031-014-9450-1

Elson, M., Breuer, J., Ivory, J. D., & Quandt, T. (2014). More Than Stories With Buttons: Narrative, Mechanics, and Context as Determinants of Player Experience in Digital Games: Narrative, Mechanics, and Context in Digital Games. Journal of Communication, 64(3), 521–542. https://doi.org/10.1111/jcom.12096

Entertainment Software Association. (2024). 2024 essential facts about the u.s. Video game industry. https://www.theesa.com/wp-content/uploads/2024/05/Essential-Facts-2024-FINAL.pdf

Freelon, D. (2014). On the Interpretation of Digital Trace Data in Communication and Social Computing Research. Journal of Broadcasting & Electronic Media, 58(1), 59–75. https://doi.org/10.1080/08838151.2013.875018

Griffioen, N., Rooij, M. van, Lichtwarck-Aschoff, A., & Granic, I. (2020). Toward improved methods in social media research. Technology, Mind, and Behavior, 1(1). https://doi.org/10.1037/tmb0000005

Johannes, N., Vuorre, M., & Przybylski, A. K. (2021). Video game play is positively correlated with well-being. Royal Society Open Science, 8(2), rsos.202049, 202049. https://doi.org/10.1098/rsos.202049

Johns, M. W. (1991). A New Method for Measuring Daytime Sleepiness: The Epworth Sleepiness Scale. Sleep, 14(6), 540–545. https://doi.org/10.1093/sleep/14.6.540

Kahn, A. S., Ratan, R., & Williams, D. (2014). Why We Distort in Self-Report: Predictors of Self-Report Errors in Video Game Play. Journal of Computer-Mediated Communication, 19(4), 1010–1023. https://doi.org/10.1111/jcc4.12056

Kahn, A. S., Shen, C., Lu, L., Ratan, R. A., Coary, S., Hou, J., Meng, J., Osborn, J., & Williams, D. (2015). The Trojan Player Typology: A cross-genre, cross-cultural, behaviorally validated scale of video game play motivations. Computers in Human Behavior, 49, 354–361. https://doi.org/10.1016/j.chb.2015.03.018

Larrieu, M., Fombouchet, Y., Billieux, J., & Decamps, G. (2023). How gaming motives affect the reciprocal relationships between video game use and quality of life: A prospective study using objective playtime indicators. Computers in Human Behavior, 147, 107824. https://doi.org/10.1016/j.chb.2023.107824

Liceralde, V. R. T., & Burgoyne, A. P. (2023). Squared tasks of attention control for jsPsych. https://doi.org/10.5281/zenodo.8313315

Martela, F., & Ryan, R. M. (2024). Assessing Autonomy, Competence, and Relatedness Briefly: Validating Single-Item Scales for Basic Psychological Need Satisfaction. European Journal of Psychological Assessment, 1015–5759/a000846. https://doi.org/10.1027/1015-5759/a000846

Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085

Ofcom. (2023). Online Nation 2023 Report (p. 106). https://www.ofcom.org.uk/siteassets/resources/documents/research-and-data/online-research/online-nation/2023/online-nation-2023-report.pdf?v=368355

Parry, D. A., Davidson, B. I., Sewall, C. J. R., Fisher, J. T., Mieczkowski, H., & Quintana, D. S. (2021). A systematic review and meta-analysis of discrepancies between logged and self-reported digital media use. Nature Human Behaviour, 5, 1535–1547. https://doi.org/10.1038/s41562-021-01117-5

Perry, R., Drachen, A., Kearney, A., Kriglstein, S., Nacke, L. E., Sifa, R., Wallner, G., & Johnson, D. (2018). Online-only friends, real-life friends or strangers? Differential associations with passion and social capital in video game play. Computers in Human Behavior, 79, 202–210. https://doi.org/10.1016/j.chb.2017.10.032

Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., & PROMIS Cooperative Group. (2011). Item Banks for Measuring Emotional Distress From the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, Anxiety, and Anger. Assessment, 18(3), 263–283. https://doi.org/10.1177/1073191111411667

Pontes, H. M., Schivinski, B., Sindermann, C., Li, M., Becker, B., Zhou, M., & Montag, C. (2019). Measurement and Conceptualization of Gaming Disorder According to the World Health Organization Framework: the Development of the Gaming Disorder Test. International Journal of Mental Health and Addiction, 19, 508528. https://doi.org/10.1007/s11469-019-00088-z

Roenneberg, T., Wirz-Justice, A., & Merrow, M. (2003). Life between Clocks: Daily Temporal Patterns of Human Chronotypes. Journal of Biological Rhythms, 18(1), 80–90. https://doi.org/10.1177/0748730402239679

Sewall, C. J. R., Goldstein, T. R., Wright, A. G. C., & Rosen, D. (2022). Does Objectively Measured Social-Media or Smartphone Use Predict Depression, Anxiety, or Social Isolation Among Young Adults? Clinical Psychological Science, 10(5), 997–1014. https://doi.org/10.1177/21677026221078309

Sewall, C. J. R., & Parry, D. A. (2021). The Role of Depression in the Discrepancy Between Estimated and Actual Smartphone Use: A Cubic Response Surface Analysis. Technology, Mind, and Behavior, 2(2). https://doi.org/10.1037/tmb0000036

Simon, J. R., & Rudell, A. P. (1967). Auditory S-R compatibility: The effect of an irrelevant cue on information processing. Journal of Applied Psychology, 51(3), 300–304. https://doi.org/10.1037/h0020586

Soto, C. J., & John, O. P. (2017). Short and extra-short forms of the Big Five Inventory2: The BFI-2-S and BFI-2-XS. Journal of Research in Personality, 68, 69–81. https://doi.org/10.1016/j.jrp.2017.02.004

Stier, S., Breuer, J., Siegers, P., & Thorson, K. (2020). Integrating Survey Data and Digital Trace Data: Key Issues in Developing an Emerging Field. Social Science Computer Review, 38(5), 503–516. https://doi.org/10.1177/0894439319843669

Tennant, R., Hiller, L., Fishwick, R., Platt, S., Joseph, S., Weich, S., Parkinson, J., Secker, J., & Stewart-Brown, S. (2007). The Warwick-Edinburgh Mental Well-being Scale (WEMWBS): development and UK validation. Health and Quality of Life Outcomes, 5(1), 63. https://doi.org/10.1186/1477-7525-5-63

Vuorre, M., Johannes, N., Magnusson, K., & Przybylski, A. K. (2022). Time spent playing video games is unlikely to impact well-being. Royal Society Open Science, 9(7), 220411. https://doi.org/10.1098/rsos.220411

Appendix

Deviations from Preregistration

We made several deviations from our preregistration to ensure we could recruit enough high-quality participants to meet our sample size goals. In our view, none are so severe enough to threaten the validity of the study. Deviations are summarised in Table 5.

Table 5: Summary of deviations from preregistration

Preregistered	Actual	Justification for Deviation
All participants sourced from PureProfile	Participants sourced from both PureProfile and Prolific	Exhausted PureProfile participant pool before reaching required sample size
Screening sample would be nationally representative by ethnicity and gender	Approximately 50% of screening was done using quotas for national representativeness by ethnicity and gender; all subsequent sampling used convenience sampling with no quotas	Exhausted participant pools of smaller demographic categories on both Prolific and PureProfile before reaching required sample size
Sample consists of participants aged 18--30 in the US and 18--75 in the UK	Sample consists of participants aged 18-40 in both regions	(1) Unable to recruit enough participants in the US aged 18--30; (2) near-zero qualification rates from UK adults over 50; (3) desire for results from both regions to be more easily comparable
To qualify, ≥75% of a participant's total gaming must take place on platforms included in the study (Xbox, Steam, Nintendo Switch)	To qualify, ≥50% of a participant's total gaming must take place on platforms included in the study (Xbox, Steam, Nintendo Switch)	Low rates of study qualification at 75% threshold, in large part due to substantial uncaptured Playstation play
Qualification contingent upon valid telemetry within last 7 days	Qualification contingent upon valid telemetry within last 14 days	Feedback from participants indicating that play during a 7-day period was subject to too many fluctuations (e.g., a busy workweek)
Daily and biweekly surveys sent at 7pm local time	Daily and biweekly surveys sent at 2pm local time	Feedback from participants indicating that evening plans often interfered with survey completion and thus adversely affected response rate
Session-level Android data captured via the ActivityWatch app	Daily-level Android data captured using screenshots of the Digital Wellbeing interface	Restrictions in PureProfile's privacy policy preventing installation of 3rd party apps; technical challenges in supporting users with the installation and data export

Table 6: Summary of survey measures used in the study

(a) All self-report measures and their assessment frequency

Construct	Measure	Example Item	Response format	Frequency
Big 5 Personality	BFI-2-XS (Soto & John, 2017)	I am someone who…is compassionate, has a soft heart.	5-pt Likert scale from 1 (Disagree strongly) to 5 (Agree strongly)	Once (baseline)
Chronotype	Munich Chronotype Questionnaire (Roenneberg et al., 2003)	I go to bed at…	Times and numbers of minutes	Once (baseline)
Player Trait Typology	Trojan Player Typology (Kahn et al., 2015)	It’s important to me to play with a tightly knit group.	5-pt Likert scale from 1 (Strongly disagree) to 5 (Strongly agree)	Once (baseline)
Gaming Disorder Symptoms^a	Gaming Disorder Test (Pontes et al., 2019)	In the past 3 months…I have had difficulties controlling my gaming activity.	5-pt Likert scale from 1 (Never) to 5 (Very often)	Twice (biweekly waves 1 & 6)
Affective valence	ad hoc	How are you feeling right now?	Visual Analogue Scale from 1 (very bad) to 100 (very good)	Daily
Basic psychological need satisfaction and frustration - life in general	Basic Psychological Need Satisfaction and Frustration Scale (Chen et al., 2015), brief version (Martela & Ryan, 2024)	In the last 24 hours…I was able to do things I really want and value in life.	7-pt Likert scale from 1 (very strongly disagree) to 7 (very strongly agree)	Daily
Basic psychological need satisfaction and frustration - video games	Basic Needs in Games scale (Ballou, Denisova, et al., 2024), brief session-level version	In my most recent session of X…I felt disappointed with my performance.	7-pt Likert scale from 1 (very strongly disagree) to 7 (very strongly agree)	Daily
Life satisfaction	Cantril Self-anchoring Scale (Cantril, 1965), daily version	I was satisfied with my life today.	Visual Analogue Scale from 1 (Strongly disagree) to 100 (Strongly agree)	Daily
Self-reported displacement	ad hoc	Think back to your most recent gaming session. If you hadn’t played a game, what would you most likely have done instead?	Open response	Daily
Sleep Quality	Sleep quality item (Item 9) from Consensus Sleep Diary (Carney et al., 2012)	How do you rate the quality of your sleep?	5-pt Likert scale from 1 (very poor) to 5 (very good)	Daily
Stressors	Daily Inventory of Stressful Events (Almeida et al., 2002), modified for digital delivery	[In the last 24 hours], what kinds of stressful event(s) occurred? [Participant selects among 7 options, including e.g. argument or disagreement]	Yes/No, followed by a 4-pt Likert scale from 1 (Not at all stressful) to 4 (Very stressful)	Daily
Social context of play	Types of social play engaged during the last 24 hours (single-player games only, multiplayer with real-world friends, multiplayer with online-only friends, multiplayer with strangers). Participants could select more than one option.	NA	Multiple selection from listed options	Daily
Social context of play	Types of social play engaged during the last 24 hours (single-player games only, multiplayer with real-world friends, multiplayer with online-only friends, multiplayer with strangers). Participants could select more than one option.	NA	Multiple selection from listed options	Daily
Basic psychological need satisfaction and frustration - video games	Basic Needs in Games scale (Ballou, Denisova, et al., 2024), gaming in general version	When playing video games during the last 2 weeks…I could play in the way I wanted.	7-pt Likert scale from 1 (very strongly disagree) to 7 (very strongly agree)	Biweekly (every 2 weeks)
Depression symptoms	PROMIS Short Form 8a Adult Depression Scale (Pilkonis et al., 2011)	In the past 7 days…I felt that I had nothing to look forward to.	5-pt Likert scale from 1 (Never) to 5 (Always)	Biweekly (every 2 weeks)
General Mental Wellbeing	Warwick-Edinburgh Mental Wellbeing Scale (Tennant et al., 2007)	I’ve been feeling optimistic about the future	5-pt Likert scale from 1 (none of the time) to 5 (all of the time)	Biweekly (every 2 weeks)
Life satisfaction	Cantril Self-anchoring Scale (Cantril, 1965)	On which step of [a ladder from 0 to 10 representing the best possible life] would you say you personally feel you stood over the past two weeks?	10-pt unlabeled scale from 0 to 10	Biweekly (every 2 weeks)
Subjective displacement	ad hoc	Over the last two weeks, to what extent has the time you spend playing video games influenced the following areas of your life? […] Work/school performance	7-pt Likert scale from 1 (greatly interfered) to 7 (greatly supported)	Biweekly (every 2 weeks)
Self-reported Playtime	Time spent playing games on platforms they had linked during the study (e.g., excluding other platforms such as Playstation) in each of the following periods: last 24 hours, last 7 days, and last 14 days.	NA	Time estimates for last 24 hours, last 7 days, last 14 days	Biweekly (every 2 weeks)
Self-reported recent sessions	Details of at least 1 and up to 3 of their most recent gaming sessions (game, date, and start/end time).	NA	Game title, date, and start/end time for 1–3 sessions	Biweekly (every 2 weeks)
Self-reported Playtime	Time spent playing games on platforms they had linked during the study (e.g., excluding other platforms such as Playstation) in each of the following periods: last 24 hours, last 7 days, and last 14 days.	NA	Time estimates for last 24 hours, last 7 days, last 14 days	Biweekly (every 2 weeks)
Self-reported recent sessions	Details of at least 1 and up to 3 of their most recent gaming sessions (game, date, and start/end time).	NA	Game title, date, and start/end time for 1–3 sessions	Biweekly (every 2 weeks)
Daytime sleepiness	Epworth Sleepiness Scale (Johns, 1991)	How likely are you to doze off or fall asleep in the following situations, in comparison to feeling just tired? […] Watching TV	4-pt Likert scale from 1 (No chance of dozing) to 4 (High chance of dozing)	Monthly (alternating biweekly surveys)
Harms and benefits of gaming	2 free text questions	Do you feel that gaming is sometimes a problem for you? Please describe.	Open text	Monthly (alternating biweekly surveys)
Sleep quality	Pittsburgh Sleep Quality Index (Buysse et al., 1989)	During the past month, what time have you usually gotten up in the morning?	Various	Monthly (alternating biweekly surveys)
^a Measured twice, at biweekly waves 1 and 6.