The Global Human Day Analysis

Visualizing how different countries spend their days.
tidytuesday
eda
visualization
Author

Sharleen Weatherley

Published

September 21, 2023

Overview

Given the Global Human Day data from TidyTuesday, I will visualize what the average days look like for the two most different countries.

Setup

First I will call the necessary packages and read in the data, found at the tidytuesday Github repository.

# Load packages
library(dplyr)
library(janitor)
library(tidyr)
library(ggplot2)
library(forcats)
library(knitr)

# Reading the data in
all_countries <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-12/all_countries.csv') |> 
  clean_names()

country_regions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-12/country_regions.csv') |> 
  clean_names()

# Set graph theme
theme_set(theme_minimal())

Initial thoughts

Looking at the data in the two data frames, my first thoughts are that I will join them together so that I can get country_name into the all_countries data set.

all_countries |> 
  head(5) |>
  kable()
category subcategory country_iso3 region_code population hours_per_day_combined uncertainty_combined
Food provision Food preparation ABW AM_C 101665 1.47 0.2376303
Food provision Food growth & collection ABW AM_C 101665 0.17 0.0268798
Food provision Food processing ABW AM_C 101665 0.00 0.0075265
Nonfood provision Materials ABW AM_C 101665 0.03 0.0020716
Nonfood provision Energy ABW AM_C 101665 0.05 0.0035968
country_regions |> 
  head(5) |> 
  kable()
region_code region_name country_name m49_code country_iso2 country_iso3 alt_country_name alt_country_name1 alt_country_name2 alt_country_name3 alt_country_name4 alt_country_name5 alt_country_name6 other_code1 other_code2
AF_N Northern Africa Algeria 12 DZ DZA Algeria People’s Democratic Republic of Algeria NA NA NA NA NA NA NA
AF_N Northern Africa Egypt 818 EG EGY Egypt Arab Republic of Egypt NA NA NA NA NA NA NA
AF_N Northern Africa Libya 434 LY LBY Libya State of Libya NA NA NA NA NA NA NA
AF_N Northern Africa Morocco 504 MA MAR Morocco Kingdom of Morocco NA NA NA NA NA NA NA
AF_N Northern Africa Sudan 729 SD SDN Sudan Republic of the Sudan The Sudan NA NA NA NA NA NA

I also want to peek at the different options for categories and subcategories:

all_countries |> 
  distinct(category, subcategory) |> 
  arrange_all() |> 
  kable()
category subcategory
Deliberate neural restructuring Religious practice
Deliberate neural restructuring Schooling & research
Experience oriented Active recreation
Experience oriented Interactive
Experience oriented Meals
Experience oriented Passive
Experience oriented Social
Food provision Food growth & collection
Food provision Food preparation
Food provision Food processing
Maintenance of surroundings Inhabited environment
Maintenance of surroundings Waste management
Nonfood provision Energy
Nonfood provision Materials
Organization Allocation
Organization Human transportation
Organization Material transportation
Somatic maintenance Health care
Somatic maintenance Hygiene & grooming
Somatic maintenance Physical child care
Somatic maintenance Sleep & bedrest
Technosphere modification Artifacts
Technosphere modification Buildings
Technosphere modification Infrastructure

Joining the two data sets

The below left_join will add country_name to the all_countries data set. I’m also going to get rid of some variables I won’t be needing and arrange the data set.

all_countries <- all_countries |>
  left_join(
    country_regions |>
      select(country_iso3,
             country_name),
    by = "country_iso3") |> 
    select(country_name,
         category,
         subcategory,
         hours_per_day_combined) |> 
  arrange(country_name,
          category,
          subcategory)

all_countries |> 
  head(5) |> 
  kable()
country_name category subcategory hours_per_day_combined
Afghanistan Deliberate neural restructuring Religious practice 0.29
Afghanistan Deliberate neural restructuring Schooling & research 1.33
Afghanistan Experience oriented Active recreation 0.23
Afghanistan Experience oriented Interactive 0.62
Afghanistan Experience oriented Meals 1.62

Add an hour variable for the category

The hours_per_day_combined variable goes with the subcategory variable. I am going to create an hour variable for the category variable.

all_countries <- all_countries |> 
  rename(subcategory_hours_per_day = hours_per_day_combined) |> 
  group_by(country_name, category) |> 
  mutate(category_hours_per_day = sum(subcategory_hours_per_day),
         .after = subcategory) |> 
  ungroup()

all_countries |> 
  head(10) |> 
  kable()
country_name category subcategory category_hours_per_day subcategory_hours_per_day
Afghanistan Deliberate neural restructuring Religious practice 1.62 0.29
Afghanistan Deliberate neural restructuring Schooling & research 1.62 1.33
Afghanistan Experience oriented Active recreation 6.32 0.23
Afghanistan Experience oriented Interactive 6.32 0.62
Afghanistan Experience oriented Meals 6.32 1.62
Afghanistan Experience oriented Passive 6.32 2.05
Afghanistan Experience oriented Social 6.32 1.80
Afghanistan Food provision Food growth & collection 2.12 0.88
Afghanistan Food provision Food preparation 2.12 1.14
Afghanistan Food provision Food processing 2.12 0.10

Visualize the two most different countries

Which two countries have the most different days? And how do they compare?

Modifying this Stackoverflow topic, I am going to find the maximum pairwise difference between two countries and plot those two countries’ days.

country_pair_diff <- all_countries |> 
  distinct(country_name,
           category,
           category_hours_per_day) |> 
  group_by(category) |> 
  reframe(result = combn(seq_along(category_hours_per_day), 2, function(i)
    list(difference = diff(category_hours_per_day[i]),  #The difference
         country_pair = paste0(country_name[i], collapse = '-')), # The pairs
    simplify = FALSE)) |> 
  unnest_wider(result) |> 
  mutate(abs_difference = abs(difference)) |> 
  group_by(country_pair) |> 
  mutate(total_diff_country_pair = sum(abs_difference)) |> 
  ungroup()

country_pair_diff |> 
  arrange(country_pair) |> 
  head(10) |> 
  kable()
category difference country_pair abs_difference total_diff_country_pair
Deliberate neural restructuring -0.52 Afghanistan-Albania 0.52 2.15
Experience oriented 0.53 Afghanistan-Albania 0.53 2.15
Food provision -0.16 Afghanistan-Albania 0.16 2.15
Maintenance of surroundings 0.16 Afghanistan-Albania 0.16 2.15
Nonfood provision 0.03 Afghanistan-Albania 0.03 2.15
Organization 0.09 Afghanistan-Albania 0.09 2.15
Somatic maintenance -0.40 Afghanistan-Albania 0.40 2.15
Technosphere modification 0.26 Afghanistan-Albania 0.26 2.15
Deliberate neural restructuring -0.20 Afghanistan-Algeria 0.20 2.46
Experience oriented 1.07 Afghanistan-Algeria 1.07 2.46

Now I can find the two most different countries. They will have the largest total_diff_country_pair.

country_pair_diff |> 
  distinct(country_pair,
           total_diff_country_pair) |> 
  arrange(-total_diff_country_pair) |> 
  head(20) |> 
  kable()
country_pair total_diff_country_pair
Democratic Republic of the Congo-Germany 7.53
Angola-Germany 7.42
Cameroon-Germany 7.42
Central African Republic-Germany 7.42
Chad-Germany 7.42
Equatorial Guinea-Germany 7.42
Gabon-Germany 7.42
Germany-Sao Tome and Principe 7.42
Bhutan-Germany 7.06
Democratic Republic of the Congo-Japan 6.97
Democratic Republic of the Congo-Switzerland 6.95
Democratic Republic of the Congo-Sweden 6.91
Democratic Republic of the Congo-Netherlands 6.89
Angola-Japan 6.86
Cameroon-Japan 6.86
Central African Republic-Japan 6.86
Chad-Japan 6.86
Equatorial Guinea-Japan 6.86
Gabon-Japan 6.86
Japan-Sao Tome and Principe 6.86

Graphing the two most different countries

So, with our knowledge now that the DRC and Germany have the most different days, let’s plot the difference to see what it looks like:

all_countries |> 
  filter(country_name %in% c("Democratic Republic of the Congo",
                             "Germany")) |> 
  distinct(country_name, category, .keep_all = TRUE) |> 
  mutate(category = category |> 
           as.factor() |> 
           fct_reorder(category_hours_per_day)) |> 
  ggplot() +
  geom_col(aes(x = category,
               y = category_hours_per_day,
               fill = country_name),
           position = "dodge2") +
  labs(x = "Category",
       y = "Hours per day",
       fill = "Country name") +
  coord_flip()

It looks like the DRC spends more time on Deliberate neural restructuring, which includes activities like religious practice and schooling and research, whereas Germany spends more time on Experience oriented activities, like active recreation, interactive, social, passive, and meals.


Session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21 ucrt)
 os       Windows 11 x64 (build 22000)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       Pacific/Honolulu
 date     2023-09-21
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 bit           4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
 bit64         4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
 cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
 callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 codetools     0.2-19  2023-02-01 [2] CRAN (R 4.3.0)
 colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
 curl          5.0.0   2023-01-12 [1] CRAN (R 4.3.0)
 devtools    * 2.4.5   2022-10-11 [1] CRAN (R 4.3.1)
 digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
 dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
 evaluate      0.20    2023-01-17 [1] CRAN (R 4.3.0)
 fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
 farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 fs            1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
 here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
 hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
 httpuv        1.6.11  2023-05-11 [1] CRAN (R 4.3.1)
 janitor     * 2.2.0   2023-02-02 [1] CRAN (R 4.3.0)
 jsonlite      1.8.4   2022-12-06 [1] CRAN (R 4.3.0)
 knitr       * 1.42    2023-01-25 [1] CRAN (R 4.3.0)
 labeling      0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
 later         1.3.1   2023-05-02 [1] CRAN (R 4.3.0)
 lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 lubridate     1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 pkgload       1.3.2   2022-11-16 [1] CRAN (R 4.3.0)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.3.0)
 processx      3.8.1   2023-04-18 [1] CRAN (R 4.3.0)
 profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
 promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
 ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
 purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
 readr         2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
 remotes       2.4.2   2021-11-30 [1] CRAN (R 4.3.0)
 rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
 rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
 rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
 scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.1)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 shiny         1.7.4   2022-12-15 [1] CRAN (R 4.3.0)
 snakecase     0.11.0  2019-05-25 [1] CRAN (R 4.3.0)
 stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
 stringr       1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
 tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
 tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.3.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
 usethis     * 2.2.2   2023-07-06 [1] CRAN (R 4.3.1)
 utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
 vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
 vroom         1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
 withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
 xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
 yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)

 [1] C:/Users/sharl/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.0/library

──────────────────────────────────────────────────────────────────────────────