Changes to video game play during the COVID-19 pandemic

Matti Vuorre https://vuorre.netlify.com (University of Oxford)https://www.oii.ox.ac.uk/people/matti-vuorre/
2021-02-10

Preface

We study changes in gaming behavior during COVID-19 using data from the Steam gaming platform. The data were scraped from Steam Database in January 2021.

The data, code, and analysis documents are available at

The rest of this document describes the dataset, and how we cleaned and analysed it. We used the following R packages:

Data processing

The data are in individual files for each of 500 titles time-series, and a separate file for the manual coding (e.g. whether a title was multiplayer or not). We loaded the time series data files into one R object, and then joined the timeseries with the manually coded table of features

Show code
# From files in this directory
paths <- list.files("data-raw/Top 500 Games Jan 2021/", full.names = TRUE)
# Read these columns in these formats
COLS <- cols_only(
  DateTime = "T", 
  Players = "d" 
)
dat <- tibble(appid = paths) %>% 
  mutate(data = map(appid, ~read_csv(., col_types = COLS))) %>% 
  mutate(appid = str_remove(basename(paths), ".csv")) %>% 
  unnest(data) %>% 
  # There is no time info in DateTime
  rename(Date = DateTime) %>% 
  mutate(Date = as.Date(Date))
# Limit to 2019 and 2020
dat <- dat %>% 
  filter(Date >= as.Date("2019-01-01"), Date <= as.Date("2020-12-31"))

# Join player counts with metadata (coding)
titles <- read_excel(
  "data-raw/500 Steam Games Coding FINAL.xlsx",
  col_types = "text"
)
# Variable names and values (labels) needed some cleaning
names(titles) <- c("appid", "name", "LB", "MP", "COOP")
titles$LB <- ifelse(titles$LB == "N0", "NN", titles$LB)
titles$LB <- ifelse(titles$LB == "Y0", "YY", titles$LB)
titles <- titles %>% 
  mutate(across(LB:COOP, ~factor(ifelse(.=="NN", "No", "Yes"))))
# We focused on multiplayer vs single player
titles <- select(titles, appid, name, MP)
titles <- mutate(
  titles, 
  MP = factor(
    MP, 
    levels = c("Yes", "No"), 
    labels = c("Multiplayer", "Single player")
  )
)
# Join tables
dat <- left_join(titles, dat)

# Summarise to a sum per day
dat <- dat %>% 
  group_by(Date, MP) %>% 
  summarise(Players = sum(Players, na.rm = TRUE)) %>% 
  ungroup() %>% 
  mutate(Players = Players / 1e6)

# Add useful time indicators
dat <- dat %>% 
  mutate(
    year = factor(year(Date)),
    yweek = week(Date)-1,  # Week number
    yday = yday(Date)-1,  # Day number
    # Weekday number starting from Monday
    wday = wday(Date, week_start = 1, label = FALSE)-1
  )

# COVID data for top 10 Steam countries
dat_covid <- covid19(c("USA", "Russia", "Brazil", "Germany", "Canada", "France", "United Kingdom", "Poland", "Turkey"))
# unique(dat_covid$id)
dat_covid <- dat_covid %>% 
  mutate(
    Country = factor(
      id, levels = c("USA", "RUS", "BRA", "DEU", "CAN", "FRA", "GBR", "POL", "TUR"), 
      labels = c("United States", "Russia", "Brazil", "Germany", "Canada", "France", "United Kingdom", "Poland", "Turkey"))
    )

Data description

The data were collapsed to a sum count per category (multiplayer [345 titles] vs single player [155 titles]) for each day. (The Players variable was rescaled to indicate millions; this sometimes prevents problems with model convergence.)

Show code
kable(head(dat), caption = "First six rows of data", digits = 2)
Table 1: First six rows of data
Date MP Players year yweek yday wday
2019-01-01 Multiplayer 4.17 2019 0 0 1
2019-01-01 Single player 0.44 2019 0 0 1
2019-01-02 Multiplayer 4.13 2019 0 1 2
2019-01-02 Single player 0.40 2019 0 1 2
2019-01-03 Multiplayer 4.07 2019 0 2 3
2019-01-03 Single player 0.39 2019 0 2 3
Show code
# Date of WHO announcement
cd <- as.Date("2020-03-11")
# Function to format Y axis neatly
label_fun <- function(x, n=1) str_glue("{format(round(x, n), nsmall=n)}M")
# Use these colors throughout
colorpal <- function(x) 
  scale_color_brewer(
    "Year", palette = "Dark2",
    aesthetics = c("color", "fill")
  )
dat %>% 
  ggplot(aes(Date, Players, col = year)) +
  colorpal() +
  geom_line(size = 1/3) +
  geom_vline(xintercept = cd, size = 1/4) +
  scale_y_continuous(
    breaks = pretty_breaks(), 
    labels = label_fun
  ) +
  facet_wrap("MP", nrow = 2, scales = "free_y")
Daily players in 2019 and 2020. Vertical line indicates the March 11th, 2020, WHO announcement of a global pandemic.

Figure 1: Daily players in 2019 and 2020. Vertical line indicates the March 11th, 2020, WHO announcement of a global pandemic.

Modelling

We modelled the daily multi- and singleplayer players from 2019 Jan 1st to end of 2020. We fit a generalized additive model of the daily player count on smooth functions of week number and weekday, and their interaction, using tensor product interactions as implemented in the R package mgcv. We also modelled different variances by year and game type because heteroskedasticity was plausible (in fact there must be vast differences between single and multiplayer). The model included separate smooths for each of the four “cells” (year: 2019 and 2020; type: multi- and single player) to allow comparisons between years and game types.

Show code
# We will use grouping factor G that includes the four year by game type "conditions". This allows us to do separate smooths for each cell and thus all comparisons we are interested in within the one model.
dat <- dat %>% 
  mutate(M = ifelse(MP=="Multiplayer", "MP", "SP")) %>% 
  mutate(G = interaction(year, M))
if (!file.exists("models/mgcv.rda")) {
  fit1 <- gam(
    list(
      Players ~ G + ti(yweek, k = 52, by = G) + 
        ti(wday, k = 7, by = G) + 
        ti(yweek, wday, k = c(26, 7), by = G), 
      ~ G
    ),
    family = gaulss(), method = "REML",
    data = dat
  )
  fit2 <- gam(
    list(
      Players ~ G + ti(yweek, k = 52, by = G) + 
        ti(wday, k = 7, by = G) + 
        ti(yweek, wday, k = c(26, 7), by = G), 
      ~ G + s(yweek, k = 6, by = G)
    ),
    family = gaulss(), method = "REML",
    data = dat
  )
  save(fit1, fit2, file = "models/mgcv.rda", compress = FALSE)
} else {load("models/mgcv.rda")}

Summary

The model summaries themselves are somewhat difficult to interpret and not necessarily of direct interest, but are printed below. Possibly of interest are the ti() terms that indicate how wiggly the changes in the number of players were over time (ti(yweek)) over a week (ti(wday)), and how wiggly the interaction between those was (ti(yweek, wday); i.e. the weekday by week interaction effect). G2019.MP = the effect for 2019 multiplayer. This “wiggliness” is reflected in the edf (estimated degrees of freedom) value, that indicates roughly how many parameters (we used penalized smooths) were needed to describe the smooth. Greater numbers indicate greater wiggliness.

Show code
kable(
  tidy(fit1, parametric = FALSE), digits = c(0,1,1,2,3),
  caption = "Smooth terms of generalized additive model."
)
Table 2: Smooth terms of generalized additive model.
term edf ref.df statistic p.value
ti(yweek):G2019.MP 34.8 41.1 5106.79 0.000
ti(yweek):G2020.MP 39.2 45.0 11884.33 0.000
ti(yweek):G2019.SP 16.0 19.9 483.74 0.000
ti(yweek):G2020.SP 42.0 47.1 5603.50 0.000
ti(wday):G2019.MP 5.9 6.0 3376.65 0.000
ti(wday):G2020.MP 5.8 6.0 2263.00 0.000
ti(wday):G2019.SP 4.4 5.2 259.28 0.000
ti(wday):G2020.SP 5.8 6.0 2122.09 0.000
ti(yweek,wday):G2019.MP 27.7 38.8 215.07 0.000
ti(yweek,wday):G2020.MP 36.9 51.7 294.08 0.000
ti(yweek,wday):G2019.SP 8.3 12.1 21.61 0.044
ti(yweek,wday):G2020.SP 29.8 42.7 217.23 0.000

Player figure

We then drew fitted lines and uncertainty intervals of the timeseries, and differences between years 2020 and 2019 over time. This allowed us to visualise the observed differences between years (hypothesized “excess play” in 2020), and model-implied uncertainties of those differences.

Show code
# Create table of data and predictions
predictions <- predict(fit2, se.fit = TRUE) %>% 
  as.data.frame() %>% 
  as_tibble()
predictions <- bind_cols(dat, predictions)

# Draw figure of data and predictions
p_players <- predictions %>% 
  # Dates from 2020 for plotting
  group_by(yday) %>% 
  mutate(Date = last(Date)) %>% 
  ggplot(aes(Date, Players, col = year, group = year, fill = year)) +
  colorpal() +
  geom_vline(xintercept = cd, size = .1) +
  scale_y_continuous(
    breaks = pretty_breaks(), labels = label_fun
  ) +
  scale_x_date(
    expand = expansion(0.01), date_breaks = "1 month", date_labels = "%b"
  ) +
  geom_ribbon(
    aes(ymin = fit.1-se.fit.1*1.96, ymax = fit.1+se.fit.1*1.96), 
    alpha = .3, col = NA
  ) +
  geom_point(shape = 1, size = 0.85) +
  geom_line(aes(y = fit.1), size = .15, alpha = .9) +
  facet_wrap("MP", scales = "free", nrow = 1) +
  theme(
    axis.title.x = element_blank(),
    legend.title = element_blank(),
    plot.margin = unit(c(1, 1, 1, 1), "pt"),
    legend.position = c(.4, .85)
  )

# "Excess play" figure
# Observed difference between years from data
tmp <- predictions %>% 
  select(MP, year, yweek, Players) %>% 
  pivot_wider(names_from = year, values_from = Players, values_fn = mean) %>% 
  mutate(Difference = `2020`-`2019`)

# Model-implied differences between years using emmeans
# the documentation for average contrasts in ?emmeans is confusing (and possibly wrong for GAMs), so we must ensure here that results are averaged over weekdays, so that we get predictions (and differences) of weekly averages and not e.g. wednesdays
emm <- emmeans(
  fit2, "G", by = "yweek", at = list(yweek = 0:52),
  cov.reduce = FALSE
)
# emm  # Ensure that message says averaged over wday
# Get pairwise comparisons between cells at each week
emmp <- pairs(emm, reverse = TRUE) %>% 
  as.data.frame() %>% 
  as_tibble()
# Contrast years within MP/SP
emmp <- emmp %>% 
  filter(contrast %in% c("2020.MP - 2019.MP", "2020.SP - 2019.SP")) %>% 
  mutate(
    MP = ifelse(str_detect(contrast, "MP"), "Multiplayer", "Single player")
  ) 

# Draw figure of observed differences and model contrasts
p_excess <- tmp %>% 
  left_join(emmp) %>% 
  mutate(year = "Difference (2020 - 2019)") %>% 
  # Convert week number to approx 2020 date for plotting
  mutate(Date = ymd("2020-01-01") + weeks(yweek)) %>% 
  ggplot(aes(Date, estimate)) +
  geom_vline(xintercept = cd, size = .1) +
  scale_y_continuous(
    "Difference (2020-2019)",
    labels = label_fun
  ) +
  scale_x_date(
    expand = expansion(0.01), date_breaks = "1 month", 
    date_labels = "%b", position = "top"
  ) +
  geom_hline(yintercept = 0, size = .1, lty = 2) +
  geom_point(aes(y=Difference), shape = 1, size = 1) +
  geom_ribbon(
    aes(ymin = (estimate-SE*2), ymax = (estimate+SE*2)),
    alpha = .2
  ) +
  geom_line(size = .3) +
  facet_wrap("MP", scales = "free") +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    strip.background.x = element_blank(),
    plot.margin = unit(c(1, 1, 1, 1), "pt"),
    strip.text.x = element_blank()
  )

# Change the layout of the panels a bit
(p_players / p_excess) + plot_layout(heights = c(7, 3))
Top: Figure of observed (empty points) and model-implied (lines and 95%CI as shades) player counts over time for 2020 (dark) and 2019 (light gray), separately for multiplayer games (left) and single player games (right). The vertical line indicates the 2020 WHO announcement. Bottom: Weekly differences between 2020 and 2019 player counts (2020 - 2019).

Figure 2: Top: Figure of observed (empty points) and model-implied (lines and 95%CI as shades) player counts over time for 2020 (dark) and 2019 (light gray), separately for multiplayer games (left) and single player games (right). The vertical line indicates the 2020 WHO announcement. Bottom: Weekly differences between 2020 and 2019 player counts (2020 - 2019).

Numeric display of excess players in 2020 April

Show code
dat %>% 
  mutate(month = month(Date, label = TRUE)) %>% 
  filter(month == "Apr") %>%
  select(year, month, yweek, MP, Players) %>% 
  pivot_wider(names_from = year, values_from = Players, values_fn = mean) %>% 
  mutate(Difference = `2020`-`2019`) %>% 
  group_by(MP) %>% 
  filter(Difference == max(Difference, na.rm = TRUE)) %>% 
  kable(
    caption = "Peak weekly player counts (and difference) in April",
    digits = 2
  )
Table 3: Peak weekly player counts (and difference) in April
month yweek MP 2019 2020 Difference
Apr 13 Single player 0.28 0.47 0.19
Apr 16 Multiplayer 3.54 4.68 1.14

Game type figure

Figure of differences between multi- and single player games in 2019 and 2020.

Show code
tmp <- dat %>% 
  group_by(year, yweek, MP) %>% 
  summarise(Players = mean(Players)) %>% 
  pivot_wider(names_from = MP, values_from = Players) %>% 
  mutate(d = Multiplayer - `Single player`)
# Difference in multi vs single player
emm <- emmeans(
  fit2, "G", by = "yweek", at = list(yweek = 0:52), 
  cov.reduce = FALSE
)
emmc <- contrast(
  emm, 
  method = list(`2019` = c(1, 0, -1, 0), `2020` = c(0, 1, 0, -1))
) %>% 
  as.data.frame() %>% 
  as_tibble() %>% 
  rename(year = contrast) %>% 
  left_join(tmp) %>% 
  # Convert week number to approx 2020 date for plotting
  mutate(Date = ymd("2020-01-01") + weeks(yweek-1))

p_gametype <- emmc %>% 
  ggplot(aes(Date, d, col = year, group = year, fill = year)) +
  colorpal() +
  scale_x_date(
    expand = expansion(0.01), date_breaks = "1 month", date_labels = "%b"
  ) +
  geom_vline(xintercept = cd, size = .1) +
  geom_point(shape = 1) +
  geom_ribbon(
    aes(ymin = (estimate-SE*2), ymax = (estimate+SE*2)),
    alpha = .2, col = NA
  ) +
  geom_line(aes(y = estimate), size = .3) +
  geom_vline(xintercept = floor(yday(cd) / 7), size = 1/4) +
  scale_y_continuous(
    "Difference in players\n(multi - single player)",
    breaks = pretty_breaks(),
    labels = label_fun
  ) +
  theme(
    axis.title.x = element_blank(),
    legend.title = element_blank(), 
    legend.background = element_blank(),
    legend.position = c(.85, .85),
    plot.margin = unit(c(1, 1, 1, 1), "pt")
  )

# Figure of COVID trends
p_covid <- dat_covid %>% 
  filter(year(date)==2020, date <= max(emmc$Date)) %>% 
  ggplot(aes(date, stringency_index, color = Country)) +
  # scale_color_discrete() +
  # Make sure the time span is same as above
  geom_blank(data = emmc, aes(Date, estimate, color = NA)) +
  scale_y_continuous(
    "COVID-19\nresponse index", breaks = pretty_breaks(),
    limits = c(0, 100)
  ) +
  geom_vline(xintercept = cd, size = .1) +
  scale_x_date(
    expand = expansion(0.01), date_breaks = "1 month", 
    date_labels = "%b", position = "top"
  ) +
  geom_line() +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    legend.title = element_blank(),
    legend.position = "bottom", 
    plot.margin = unit(c(0, 1, 1, 1), "pt")
  )

(p_gametype / p_covid) + plot_layout(heights = c(6.5, 3.5))
Weekly difference between multiplayer and single player games.

Figure 3: Weekly difference between multiplayer and single player games.

Numeric display of differences between multi- and single player counts

Show code
# Difference in February and max difference in 2020 and 2019
dat %>% 
  mutate(month = month(Date, label = TRUE)) %>% 
  filter(month %in% c("Feb", "Apr")) %>%
  select(year, month, MP, Players) %>% 
  pivot_wider(names_from = MP, values_from = Players, values_fn = mean) %>% 
  mutate(Difference = Multiplayer-`Single player`) %>% 
  kable(
    caption = "Multi- and single player counts and difference at select time points.",
    digits = 2
  )
Table 4: Multi- and single player counts and difference at select time points.
year month Multiplayer Single player Difference
2019 Feb 3.79 0.34 3.45
2019 Apr 3.74 0.28 3.46
2020 Feb 3.87 0.35 3.52
2020 Apr 4.80 0.42 4.38

Weekend figure

We then focused on the weekend effect. How did weekdays differ from one another, and how did that difference change over time? From this we can observe that in 2020, the weekend effect was smaller after roughly week 10 than it was in 2019.

Show code
# Create table of data and predictions
predictions <- predict(fit2, se.fit = TRUE) %>% 
  as.data.frame() %>% 
  as_tibble()
predictions <- bind_cols(dat, predictions)

# Draw observed and model-implied player count for select weeks
p_wend_1 <- left_join(dat, predictions) %>% 
  filter(yweek %in% seq(5, 45, by = 10)) %>% 
  ggplot(aes(wday, Players, col = year, group = year, fill = year)) +
  colorpal() +
  scale_y_continuous(
    breaks = pretty_breaks(), labels = function(x) label_fun(x, 1)
  ) +
  scale_x_continuous(
    "Day of week", breaks = 0:6,
    labels = c("M", "T", "W", "T", "F", "S", "S")
  ) +
  geom_point(size = 1, shape = 21) +
  geom_ribbon(
    aes(ymin = fit.1 - se.fit.1*1.96, ymax = fit.1 + se.fit.1*1.96),
    alpha = .2, col = NA
  ) +
  geom_line(aes(y = fit.1)) +
  facet_grid(MP~yweek, scales = "free") +
  theme(
    legend.position = "none",
    axis.title.x = element_blank()
  )

# Observed weekday/end player count and difference from data
tmp <- dat %>% 
  group_by(year, yweek, MP) %>%
  summarise(
    wday_m = mean(Players[wday %in% 0:4]),
    wend_m = mean(Players[wday %in% 5:6]),
    w_eff = wend_m - wday_m
  ) %>% 
  ungroup() %>% 
  drop_na(w_eff)  # Doesn't exist for week 53

# Model-implied weekday player numbers
# Note there is no weekend effect for the last week in data
emm <- emmeans(
  fit2, "wday", by = c("G", "yweek"), 
  at = list(wday = 0:6, yweek = 0:52)
)
# Calculate model-implied average differences between weekdays and ends
emmz <- contrast(
  emm, infer = TRUE,
  method = list(c(0,0,0,0,0,1,1)/2 - c(1,1,1,1,1,0,0)/5)
) %>% 
  as_tibble() %>% 
  separate(G, c("year", "MP")) %>% 
  mutate(
    MP = ifelse(str_detect(MP, "MP"), "Multiplayer", "Single player")
  )

# Draw observed and model-implied difference between weekend and weekday
p_wend_2 <- left_join(tmp, emmz) %>% 
    # Convert week number to approx 2020 date for plotting
  mutate(Date = ymd("2020-01-01") + weeks(yweek-1)) %>% 
  ggplot(aes(Date, estimate, col = year, group = year, fill = year)) +
  scale_color_brewer(
    "Year", palette = "Dark2",
    aesthetics = c("color", "fill")
  ) +
  scale_y_continuous(
    "Weekend effect",
    breaks = pretty_breaks(), labels = function(x) label_fun(x, 2)
  ) +
  scale_x_date(
    expand = expansion(0.01), date_breaks = "1 month", date_labels = "%b"
  ) +
  geom_vline(xintercept = cd, size = .1) +
  geom_ribbon(
    aes(ymin = lower.CL, ymax = upper.CL),
    alpha = .2, col = NA
  ) +
  geom_line() +
  geom_point(aes(y = w_eff), shape = 1) +
  facet_wrap("MP", nrow = 1, strip.position = "top", scales = "free") +
  theme(
    legend.position = c(.9, .75), 
    legend.title = element_blank(),
    axis.title.x = element_blank(),
    plot.margin = unit(c(1, 1, 1, 1), "pt")
    )

# Difference in weekend effect 2019-2020
emm2 <- emmeans(
  fit2, c("wday", "G"), by = "yweek",
  at = list(wday = 0:6, G = unique(dat$G), yweek = 0:52)
  )

# This beast calculates the difference in weekend effect 2020-2019 for MP and SP
emm2z <- contrast(
  emm2, infer = TRUE,
  method = list(
    MP = c(
      c(0,0,0,0,0,0,0,0,0,0,0,0,1,1, rep(0,14))/2 - 
        c(0,0,0,0,0,0,0,1,1,1,1,1,0,0, rep(0, 14))/5
    ) - c(
      c(0,0,0,0,0,1,1,0,0,0,0,0,0,0, rep(0,14))/2 - 
      c(1,1,1,1,1,0,0,0,0,0,0,0,0,0, rep(0, 14))/5
    ),
    SP = c(
      c(rep(0,14), 0,0,0,0,0,0,0,0,0,0,0,0,1,1)/2 - 
        c(rep(0,14), 0,0,0,0,0,0,0,1,1,1,1,1,0,0)/5
    ) - c(
      c(rep(0,14), 0,0,0,0,0,1,1,0,0,0,0,0,0,0)/2 - 
      c(rep(0,14), 1,1,1,1,1,0,0,0,0,0,0,0,0,0)/5
    )
  )
)
emm2z <- as.data.frame(emm2z) %>% 
  as_tibble() %>% 
  mutate(
    MP = ifelse(str_detect(contrast, "MP"), "Multiplayer", "Single player")
  )
p_wend_3 <- select(tmp, year, yweek, MP, w_eff) %>% 
  pivot_wider(names_from = year, values_from = w_eff) %>% 
  mutate(Difference = `2020`-`2019`) %>% 
  left_join(emm2z) %>% 
  # Convert week number to approx 2020 date for plotting
  mutate(Date = ymd("2020-01-01") + weeks(yweek-1)) %>% 
  ggplot(aes(Date, Difference)) +
  colorpal() +
  scale_x_date(
    expand = expansion(0.01), date_breaks = "1 month", 
    date_labels = "%b", position = "top"
  ) +
  geom_vline(xintercept = cd, size = .1) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_point(shape = 1) +
  geom_ribbon(
    aes(ymin = lower.CL, ymax = upper.CL),
    alpha = .2, col = NA
  ) +
  geom_line(aes(y = estimate), size = .3) +
  scale_y_continuous(
    "Difference in weekend effect",
    breaks = pretty_breaks(),
    labels = label_fun
  ) +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    strip.background = element_blank(),
    plot.margin = unit(c(1, 1, 1, 1), "pt"),
    strip.text = element_blank()
  ) +
  facet_wrap("MP", nrow = 1, scales = "free")

# Put the plots together
(p_wend_1 / p_wend_2 / p_wend_3) + 
  plot_layout(heights = c(1/3, 1/3, 1/3))
The weekend effect over time. Top: Observed (empty points) and model implied (lines with 95%CIs as shades) player counts over weekdays for five representative weeks (columns) separately for multi- (top) and single player games (bottom). 2019 is shown in light gray, and 2020 in dark gray. Bottom: Observed and model-implied weekly differences (weekend - weekdays) between weekdays and weekends over time.

Figure 4: The weekend effect over time. Top: Observed (empty points) and model implied (lines with 95%CIs as shades) player counts over weekdays for five representative weeks (columns) separately for multi- (top) and single player games (bottom). 2019 is shown in light gray, and 2020 in dark gray. Bottom: Observed and model-implied weekly differences (weekend - weekdays) between weekdays and weekends over time.

Numeric display of weekend effect

Show code
dat %>% 
  mutate(month = month(Date, label = TRUE)) %>% 
  filter(month %in% c("Feb", "Apr")) %>%
  group_by(year, month, yweek, MP) %>%
  summarise(
    wday_m = mean(Players[wday %in% 1:5]),
    wend_m = mean(Players[wday %in% 6:7]),
    w_eff = wend_m - wday_m
  ) %>% 
  ungroup() %>% 
  drop_na(w_eff) %>% 
  group_by(year, month, MP) %>% 
  summarise(across(wday_m:w_eff, mean)) %>% 
  kable(
    caption = "Weekday effect",
    digits = 2
  )
Table 5: Weekday effect
year month MP wday_m wend_m w_eff
2019 Feb Multiplayer 3.80 4.21 0.41
2019 Feb Single player 0.37 0.40 0.03
2019 Apr Multiplayer 3.71 4.03 0.32
2019 Apr Single player 0.28 0.34 0.06
2020 Feb Multiplayer 3.83 4.31 0.48
2020 Feb Single player 0.34 0.42 0.08
2020 Apr Multiplayer 4.81 4.98 0.17
2020 Apr Single player 0.42 0.46 0.05

Additional numbers

Show code
dat %>% 
  mutate(month = month(Date, label = TRUE)) %>% 
  filter(month %in% c("Mar", "Apr", "May", "Jun")) %>% 
  group_by(year, MP, month) %>% 
  summarise(Players = mean(Players)) %>% 
  summarise(min = min(Players), max = max(Players)) %>% 
  kable(digits = 2, caption = "Ranges of players")
Table 6: Ranges of players
year MP min max
2019 Multiplayer 3.70 3.94
2019 Single player 0.28 0.30
2020 Multiplayer 3.85 4.80
2020 Single player 0.33 0.42
Show code
dat %>% 
  mutate(month = month(Date, label = TRUE)) %>% 
  filter(month %in% c("Apr")) %>% 
  group_by(year, MP, month) %>% 
  summarise(max_Players = max(Players)) %>% 
  kable(digits = 2, caption = "Peak players")
Table 6: Peak players
year MP month max_Players
2019 Multiplayer Apr 4.24
2019 Single player Apr 0.36
2020 Multiplayer Apr 5.78
2020 Single player Apr 0.57
Show code
dat %>% 
  mutate(month = month(Date, label = TRUE)) %>% 
  filter(month %in% c("Feb", "Apr")) %>% 
  group_by(year, MP, month) %>% 
  summarise(Players = mean(Players)) %>% 
  summarise(min = min(Players), max = max(Players)) %>% 
  kable(digits = 2, caption = "Ranges of players")
Table 6: Ranges of players
year MP min max
2019 Multiplayer 3.74 3.79
2019 Single player 0.28 0.34
2020 Multiplayer 3.87 4.80
2020 Single player 0.35 0.42

Additional model checking

Just checking. Apart from outliers (look at 2019 single player, quite noisy!) doing pretty well.

Show code
par(mfrow = c(2,2))
gam.check(fit2)


Method: REML   Optimizer: outer newton
full convergence after 9 iterations.
Gradient range [-1.426917e-06,1.69075e-05]
(score -2349.344 & scale 1).
Hessian positive definite, eigenvalue range [0.4937978,17.89726].
Model rank =  856 / 856 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                            k'    edf k-index p-value    
ti(yweek):G2019.MP       51.00  33.94    0.74  <2e-16 ***
ti(yweek):G2020.MP       51.00  39.16    0.74  <2e-16 ***
ti(yweek):G2019.SP       51.00  27.86    0.74  <2e-16 ***
ti(yweek):G2020.SP       51.00  42.18    0.74  <2e-16 ***
ti(wday):G2019.MP         6.00   5.88    0.80  <2e-16 ***
ti(wday):G2020.MP         6.00   5.83    0.80  <2e-16 ***
ti(wday):G2019.SP         6.00   5.70    0.80  <2e-16 ***
ti(wday):G2020.SP         6.00   5.83    0.80  <2e-16 ***
ti(yweek,wday):G2019.MP 150.00  30.48    0.81  <2e-16 ***
ti(yweek,wday):G2020.MP 150.00  36.24    0.81  <2e-16 ***
ti(yweek,wday):G2019.SP 150.00  10.10    0.81  <2e-16 ***
ti(yweek,wday):G2020.SP 150.00  20.33    0.81  <2e-16 ***
s.1(yweek):G2019.MP       5.00   4.46    0.74  <2e-16 ***
s.1(yweek):G2020.MP       5.00   4.21    0.74  <2e-16 ***
s.1(yweek):G2019.SP       5.00   4.87    0.74  <2e-16 ***
s.1(yweek):G2020.SP       5.00   2.61    0.74  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
par(mfrow = c(1,1))

System info

Show code
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.3 (2020-10-10)
 os       macOS Big Sur 10.16         
 system   x86_64, darwin17.0          
 ui       X11                         
 language (EN)                        
 collate  en_GB.UTF-8                 
 ctype    en_GB.UTF-8                 
 tz       Europe/London               
 date     2021-02-10                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version date       lib source        
 assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
 backports      1.2.1   2020-12-09 [1] CRAN (R 4.0.2)
 broom        * 0.7.3   2020-12-16 [1] CRAN (R 4.0.2)
 cellranger     1.1.0   2016-07-27 [1] CRAN (R 4.0.0)
 cli            2.2.0   2020-11-20 [1] CRAN (R 4.0.2)
 coda           0.19-4  2020-09-30 [1] CRAN (R 4.0.2)
 codetools      0.2-18  2020-11-04 [1] CRAN (R 4.0.2)
 colorspace     2.0-0   2020-11-11 [1] CRAN (R 4.0.2)
 COVID19      * 2.3.2   2021-01-06 [1] CRAN (R 4.0.2)
 crayon         1.4.1   2021-02-08 [1] CRAN (R 4.0.3)
 DBI            1.1.1   2021-01-15 [1] CRAN (R 4.0.2)
 dbplyr         2.0.0   2020-11-03 [1] CRAN (R 4.0.2)
 digest         0.6.27  2020-10-24 [1] CRAN (R 4.0.2)
 distill        1.2     2021-01-13 [1] CRAN (R 4.0.2)
 downlit        0.2.1   2020-11-04 [1] CRAN (R 4.0.2)
 dplyr        * 1.0.3   2021-01-15 [1] CRAN (R 4.0.2)
 ellipsis       0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
 emmeans      * 1.5.3   2020-12-09 [1] CRAN (R 4.0.2)
 estimability   1.3     2018-02-11 [1] CRAN (R 4.0.0)
 evaluate       0.14    2019-05-28 [1] CRAN (R 4.0.0)
 fansi          0.4.2   2021-01-15 [1] CRAN (R 4.0.2)
 farver         2.0.3   2020-01-16 [1] CRAN (R 4.0.0)
 forcats      * 0.5.1   2021-01-27 [1] CRAN (R 4.0.2)
 fs             1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
 generics       0.1.0   2020-10-31 [1] CRAN (R 4.0.2)
 ggplot2      * 3.3.3   2020-12-30 [1] CRAN (R 4.0.2)
 glue           1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
 gtable         0.3.0   2019-03-25 [1] CRAN (R 4.0.0)
 haven          2.3.1   2020-06-01 [1] CRAN (R 4.0.0)
 highr          0.8     2019-03-20 [1] CRAN (R 4.0.0)
 hms            1.0.0   2021-01-13 [1] CRAN (R 4.0.2)
 htmltools      0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
 httr           1.4.2   2020-07-20 [1] CRAN (R 4.0.2)
 jsonlite       1.7.2   2020-12-09 [1] CRAN (R 4.0.2)
 knitr        * 1.31    2021-01-27 [1] CRAN (R 4.0.2)
 labeling       0.4.2   2020-10-20 [1] CRAN (R 4.0.2)
 lattice        0.20-41 2020-04-02 [1] CRAN (R 4.0.3)
 lifecycle      0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
 lubridate    * 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.2)
 magrittr       2.0.1   2020-11-17 [1] CRAN (R 4.0.2)
 MASS           7.3-53  2020-09-09 [1] CRAN (R 4.0.3)
 Matrix         1.3-2   2021-01-06 [1] CRAN (R 4.0.2)
 mgcv         * 1.8-33  2020-08-27 [1] CRAN (R 4.0.3)
 modelr         0.1.8   2020-05-19 [1] CRAN (R 4.0.0)
 multcomp       1.4-15  2020-11-14 [1] CRAN (R 4.0.2)
 munsell        0.5.0   2018-06-12 [1] CRAN (R 4.0.0)
 mvtnorm        1.1-1   2020-06-09 [1] CRAN (R 4.0.0)
 nlme         * 3.1-151 2020-12-10 [1] CRAN (R 4.0.2)
 patchwork    * 1.1.1   2020-12-17 [1] CRAN (R 4.0.2)
 pillar         1.4.7   2020-11-20 [1] CRAN (R 4.0.2)
 pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
 plyr           1.8.6   2020-03-03 [1] CRAN (R 4.0.0)
 purrr        * 0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
 R6             2.5.0   2020-10-28 [1] CRAN (R 4.0.2)
 RColorBrewer   1.1-2   2014-12-07 [1] CRAN (R 4.0.0)
 Rcpp           1.0.6   2021-01-15 [1] CRAN (R 4.0.2)
 readr        * 1.4.0   2020-10-05 [1] CRAN (R 4.0.2)
 readxl       * 1.3.1   2019-03-13 [1] CRAN (R 4.0.0)
 reprex         1.0.0   2021-01-27 [1] CRAN (R 4.0.2)
 rlang          0.4.10  2020-12-30 [1] CRAN (R 4.0.2)
 rmarkdown      2.6     2020-12-14 [1] CRAN (R 4.0.2)
 rstudioapi     0.13    2020-11-12 [1] CRAN (R 4.0.2)
 rvest          0.3.6   2020-07-25 [1] CRAN (R 4.0.2)
 sandwich       3.0-0   2020-10-02 [1] CRAN (R 4.0.2)
 scales       * 1.1.1   2020-05-11 [1] CRAN (R 4.0.0)
 sessioninfo  * 1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
 stringi        1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
 stringr      * 1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
 survival       3.2-7   2020-09-28 [1] CRAN (R 4.0.3)
 TH.data        1.0-10  2019-01-21 [1] CRAN (R 4.0.0)
 tibble       * 3.0.6   2021-01-29 [1] CRAN (R 4.0.3)
 tidyr        * 1.1.2   2020-08-27 [1] CRAN (R 4.0.2)
 tidyselect     1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
 tidyverse    * 1.3.0   2019-11-21 [1] CRAN (R 4.0.0)
 vctrs          0.3.6   2020-12-17 [1] CRAN (R 4.0.2)
 withr          2.4.1   2021-01-26 [1] CRAN (R 4.0.2)
 xfun           0.20    2021-01-06 [1] CRAN (R 4.0.2)
 xml2           1.3.2   2020-04-23 [1] CRAN (R 4.0.0)
 xtable         1.8-4   2019-04-21 [1] CRAN (R 4.0.0)
 yaml           2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
 zoo            1.8-8   2020-05-02 [1] CRAN (R 4.0.0)

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/digital-wellbeing/steam-lockdown, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".