Applied Epi Community

How can I "unaggregate" aggregate data into linelist data?

Hello,

I have an aggregate data set such that I have the frequencies for each combination of certain variables of interest. However, I am interested in reverting this aggregate data back to a linelist format. Is there an easy way I can do this with existing functions in the tidyverse packages?

Here is an example of what my data looks like:

library(tidyverse)

tibble::tribble(
    ~sex, ~age_group, ~n,
    "female", "0-19", 0,
    "female", "20-29", 4,
    "male", "0-19", 1
    )
#> # A tibble: 3 Γ— 3
#>   sex    age_group     n
#>   <chr>  <chr>     <dbl>
#> 1 female 0-19          0
#> 2 female 20-29         4
#> 3 male   0-19          1

Created on 2022-04-21 by the reprex package (v2.0.1)

Regards,

John

Hello John,

This is the exact scenario I was recently in! As you suspected, there is already a function, called uncount, in the tidyr package to do this exact thing.

Here is some code demonstrating how it works to β€œunaggregate” aggregate data into linelist data:

#Loading tidyverse
library(tidyverse)

#Generating aggregate data
agg_data <- expand_grid(
    sex = c("female", "male"),
    age_group = c("0-19", "20-29", "30-39", "40-49", "50-59", "60+")
) |>
    rowwise() |>
    mutate(n = rpois(n = 1, lambda = 2)) |>
    ungroup()

#Examining the aggregate data
agg_data |>
    slice_head(n = 5)
#> # A tibble: 5 Γ— 3
#>   sex    age_group     n
#>   <chr>  <chr>     <int>
#> 1 female 0-19          4
#> 2 female 20-29         1
#> 3 female 30-39         1
#> 4 female 40-49         2
#> 5 female 50-59         0

#Using the uncount function
linelist_data <- agg_data |>
    uncount(weights = n)

#Examining the linelist data
linelist_data |>
    slice_head(n = 5)
#> # A tibble: 5 Γ— 2
#>   sex    age_group
#>   <chr>  <chr>    
#> 1 female 0-19     
#> 2 female 0-19     
#> 3 female 0-19     
#> 4 female 0-19     
#> 5 female 20-29

Created on 2022-04-22 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       macOS Big Sur/Monterey 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_CA.UTF-8
#>  ctype    en_CA.UTF-8
#>  tz       America/Toronto
#>  date     2022-04-22
#>  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.1.0)
#>  broom         0.8.0   2022-04-13 [1] CRAN (R 4.1.3)
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.1.0)
#>  cli           3.2.0   2022-02-14 [1] RSPM (R 4.1.2)
#>  colorspace    2.0-3   2022-02-21 [1] RSPM (R 4.1.2)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.1.3)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.1.1)
#>  dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.1.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.1)
#>  dplyr       * 1.0.8   2022-02-08 [1] RSPM (R 4.1.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.15    2022-02-18 [1] RSPM (R 4.1.2)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.1.3)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.1.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.1)
#>  generics      0.1.2   2022-01-31 [1] RSPM (R 4.1.2)
#>  ggplot2     * 3.3.5   2021-06-25 [1] CRAN (R 4.1.0)
#>  glue          1.6.2   2022-02-24 [1] RSPM (R 4.1.2)
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.1.0)
#>  haven         2.5.0   2022-04-15 [1] CRAN (R 4.1.3)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.1.1)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
#>  httr          1.4.2   2020-07-20 [1] CRAN (R 4.1.0)
#>  jsonlite      1.8.0   2022-02-22 [1] RSPM (R 4.1.2)
#>  knitr         1.38    2022-03-25 [1] CRAN (R 4.1.3)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.1)
#>  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.3)
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.1.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.1.0)
#>  pillar        1.7.0   2022-02-01 [1] RSPM (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.1.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.0)
#>  readr       * 2.1.2   2022-01-30 [1] RSPM (R 4.1.2)
#>  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.1.3)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.13    2022-03-10 [1] CRAN (R 4.1.2)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.1.1)
#>  scales        1.2.0   2022-04-13 [1] CRAN (R 4.1.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.1)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.1)
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.1.2)
#>  tibble      * 3.1.6   2021-11-07 [1] CRAN (R 4.1.1)
#>  tidyr       * 1.2.0   2022-02-01 [1] RSPM (R 4.1.2)
#>  tidyselect    1.1.2   2022-02-21 [1] RSPM (R 4.1.2)
#>  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.1.0)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.1.3)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.1.3)
#>  withr         2.5.0   2022-03-03 [1] RSPM (R 4.1.2)
#>  xfun          0.30    2022-03-02 [1] RSPM (R 4.1.2)
#>  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.1.1)
#>  yaml          2.3.5   2022-02-21 [1] RSPM (R 4.1.2)
#> 
#>  [1] /Users/timothychisamore/Library/R/x86_64/4.1/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

All the best,

Tim

1 Like

This is such a simple solution, thank you for taking the time to share it with me!

Regards

1 Like