Create epi curves for data sets with small numbers and data missing for some time periods (i.e. weeks, months or years)

Hi all,

Issue: I am a new user to R. Trying to create a sitrep for measles data from 2019 onwards. During the COVID-19 years there were smaller numbers and in one year (2021) no cases. Using ggplot to create an epi curve by year (or quarter or month or weeks) works, but only plots those time points where cases were reported. Help needed to outline the most streamlined method to do this moving forward, including packages.
Timeline/urgency - Not urgent

The current method we are using is

  1. Aggregate cases by time period i.e. cases over year_diagnosed and state.
    2 Create a longer data set of all possible outcomes using the tidyr::expand function
    3 Merge the two datasets.

count cases by month

cases_year_raw ← measles_NNDSS_2019 %>%
count(year_diagnosed, state) %>%
arrange(year_diagnosed)

create longer dataset of all possible months, year and state

cases_year_expanded ← cases_year_raw %>%
rbind(data.frame(
year_diagnosed = “2021”,
state = “QLD”,
n = 0
)) %>%
tidyr::expand(year_diagnosed, state)

merge so that all months, year and state

cases_year ← cases_year_raw %>%
right_join(cases_year_expanded, by = join_by(year_diagnosed, state)) %>%
mutate(n = replace_na(n, 0),
year_diagnosed = fct_relevel(year_diagnosed, as.character(2019:2024))) %>%
arrange(year_diagnosed) %>%
mutate(state = factor(state, levels = unique(state)))

Any assistance or suggestions would be appreciated.

1 Like

Hello,

Here is how I would approach this problem using some of the functions from tidyverse:

# loading packages
library(tidyverse)

# creating fake data
fake_data <- tibble(
    episode_year = sample(
        x = 2000L:2024L,
        replace = TRUE,
        size = 20
    ),
    geo_unit = sample(
        x = c("A", "B", "C", "D"),
        replace = TRUE,
        size = 20
    )
)

# filling missing counts by episode_year and geo_unit
filled_data <- fake_data |>
    count(episode_year, geo_unit) |>
    complete(
        episode_year = full_seq(episode_year, 1),
        geo_unit = unique(geo_unit),
        fill = list(n = 0)
    )

# viewing the data
filled_data
#> # A tibble: 88 × 3
#>    episode_year geo_unit     n
#>           <dbl> <chr>    <int>
#>  1         2000 A            0
#>  2         2000 B            1
#>  3         2000 C            0
#>  4         2000 D            1
#>  5         2001 A            1
#>  6         2001 B            0
#>  7         2001 C            0
#>  8         2001 D            0
#>  9         2002 A            1
#> 10         2002 B            0
#> # ℹ 78 more rows

Created on 2024-08-13 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       macOS Sonoma 14.5
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Toronto
#>  date     2024-08-13
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.3   2024-06-21 [1] RSPM (R 4.4.0)
#>  colorspace    2.1-0   2023-01-23 [1] RSPM (R 4.4.0)
#>  digest        0.6.36  2024-06-23 [1] RSPM (R 4.4.0)
#>  dplyr       * 1.1.4   2023-11-17 [1] RSPM (R 4.4.0)
#>  evaluate      0.24.0  2024-06-10 [1] RSPM (R 4.4.0)
#>  fansi         1.0.6   2023-12-08 [1] RSPM (R 4.4.0)
#>  fastmap       1.2.0   2024-05-15 [1] RSPM (R 4.4.0)
#>  forcats     * 1.0.0   2023-01-29 [1] RSPM (R 4.4.0)
#>  fs            1.6.4   2024-04-25 [1] RSPM (R 4.4.0)
#>  generics      0.1.3   2022-07-05 [1] RSPM (R 4.4.0)
#>  ggplot2     * 3.5.1   2024-04-23 [1] RSPM (R 4.4.0)
#>  glue          1.7.0   2024-01-09 [1] RSPM (R 4.4.0)
#>  gtable        0.3.5   2024-04-22 [1] RSPM (R 4.4.0)
#>  hms           1.1.3   2023-03-21 [1] RSPM (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#>  knitr         1.48    2024-07-07 [1] RSPM (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.4.0)
#>  lubridate   * 1.9.3   2023-09-27 [1] RSPM (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.4.0)
#>  munsell       0.5.1   2024-04-01 [1] RSPM (R 4.4.0)
#>  pillar        1.9.0   2023-03-22 [1] RSPM (R 4.4.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.4.0)
#>  purrr       * 1.0.2   2023-08-10 [1] RSPM (R 4.4.0)
#>  R6            2.5.1   2021-08-19 [1] RSPM (R 4.4.0)
#>  readr       * 2.1.5   2024-01-10 [1] RSPM (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] RSPM (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] RSPM (R 4.4.0)
#>  rmarkdown     2.27    2024-05-17 [1] RSPM (R 4.4.0)
#>  rstudioapi    0.16.0  2024-03-24 [1] RSPM (R 4.4.0)
#>  scales        1.3.0   2023-11-28 [1] RSPM (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.4.0)
#>  stringi       1.8.4   2024-05-06 [1] RSPM (R 4.4.0)
#>  stringr     * 1.5.1   2023-11-14 [1] RSPM (R 4.4.0)
#>  tibble      * 3.2.1   2023-03-20 [1] RSPM (R 4.4.0)
#>  tidyr       * 1.3.1   2024-01-24 [1] RSPM (R 4.4.0)
#>  tidyselect    1.2.1   2024-03-11 [1] RSPM (R 4.4.0)
#>  tidyverse   * 2.0.0   2023-02-22 [1] RSPM (R 4.4.0)
#>  timechange    0.3.0   2024-01-18 [1] RSPM (R 4.4.0)
#>  tzdb          0.4.0   2023-05-12 [1] RSPM (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [1] RSPM (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.4.0)
#>  withr         3.0.0   2024-01-16 [1] RSPM (R 4.4.0)
#>  xfun          0.45    2024-06-16 [1] RSPM (R 4.4.0)
#>  yaml          2.3.9   2024-07-05 [1] RSPM (R 4.4.0)
#> 
#>  [1] /Users/timothychisamore/Library/R/x86_64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

All the best,

Tim

@cushlamcoffey Did the answer provided help you, or were you able to solve your issue another way?

Our volunteers spend their time writing answers, and others in the forum will learn from the discussion, so we would appreciate the feedback.

Thank you!

1 Like