Creating an epi curve based on a unique date range and adding trend lines

Describe your issue

Hello everyone! I’m trying to create an epidemic curve for our fireworks injury surveillance with the following features:

  1. Cases are counted based on a specific range:
    Surveillance day 1: date of injury on 2022-12-21 0600H to 2022-12-22 0559H
    Surveillance day 2: date of injury on 2022-12-22 0600H to 2022-12-23 0559H
    and so on until surveillance day 16: date of injury on 2023-01-05 0600H to 2023-01-06 0559H of the following year

Line list data is available for #1

  1. Aside from the histogram of cases, I’d like to add 2 trend lines:
    Line 1: cases from the previous surveillance year (date of injury on 2021-12-21 0600H to 2022-01-06 0559H)
    Line 2: five-year average cases (2017-2021)

Daily (surveillance day as in #1) aggregate cases are available from 2017-2021 in long format

What steps have you already taken to find an answer?

For #1, I think I have to create a new surveillance_day variable (ranges from 1-16 depending on the date of injury) then use that as reference to plot the epi curve. Initially I thought about just doing a mutate() with case_when() providing the range of date/time per surveillance day, but that would entail writing out each date/time and having to change the year every new surveillance period starts. Is there a quicker way to do this?

For #2, I looked into the incidence2 and i2extras package on rolling/moving average, but that seems to use line list data whereas in my case I only have aggregate data for the previous years.

Looking forward to some ideas how to approach this.

Happy holidays!

1 Like

Hello,

Would you be able to share some fake data to help explore some potential solutions to your problem? Your idea to use a case_when statement may be justifiable for short spans of dates, however, if you need to use it to generate rolling averages and standard deviations for a threshold calculation, it is not feasible.

All the best,

Tim

Hi Tim, thanks for the response. I’m attaching a de-identified df as well as the reference dfs (5-year and 2021 cases)

For #1, I’m still unsure how to approach the creation of a surveillance_day variable based on the date/time period I specified earlier.

For #2, I managed to create an epicurve using incidence2 package and adding 2 geom_line for each of the references (5-year daily average cases and 2021 daily cases). In order to do this, I had to make the dates for the 5-year average and 2021 the same as the 2022 surveillance period. However, I’m unable to create a legend that will also include the color for the histogram. So far, adding scale_color_manual seems to only β€œget” the 2 geom_lines.

Thank you for your help!

# load package

pacman::p_load(lubridate, tidyverse, incidence2, scales, reprex)

# generate demo data

demo_data <- data.frame(
  pno = c(3L,8L,10L,11L,18L,19L,20L,21L,23L,24L),
  date_report = c("2022-12-21","2022-12-23","2022-12-23","2022-12-24","2022-12-25",
                "2022-12-25","2022-12-25","2022-12-25", "2022-12-25","2022-12-25"),
  time_report = c("23:50:00","16:15:28","21:25:02","02:57:25","01:51:04","02:55:10",
                "04:14:59","04:37:04","05:49:53","05:51:52"),
  date_inj = c("2022-12-21","2022-12-23","2022-12-23","2022-12-23","2022-12-25",
             "2022-12-25","2022-12-25","2022-12-25","2022-12-22","2022-12-25"),
  time_inj = c("15:00:00","14:40:00","01:00:00","14:00:00","00:30:00","00:00:00",
             "00:00:00","00:20:00","14:00:00","00:30:00"))

hist_2021 <- data.frame(
  date = c("2022-12-21","2022-12-22","2022-12-23","2022-12-24","2022-12-25","2022-12-26",
           "2022-12-27","2022-12-28","2022-12-29","2022-12-30","2022-12-31","2023-01-01",
           "2023-01-02","2023-01-03","2023-01-04","2023-01-05","2023-01-06"),
  n_2021 = c(0L,1L,3L,4L,9L,4L,2L,2L,2L,10L,39L,106L,3L,1L,2L,1L,0L))

hist_5yr <- data.frame(
  date = c("2022-12-21","2022-12-22","2022-12-23","2022-12-24","2022-12-25","2022-12-26",
           "2022-12-27","2022-12-28","2022-12-29","2022-12-30","2022-12-31","2023-01-01",
           "2023-01-02","2023-01-03","2023-01-04","2023-01-05","2023-01-06"),
  n_5yr = c(3L,2L,3L,8L,11L,6L,8L,4L,4L,10L,74L,167L,9L,5L,3L,1L,0L))


# change class to dates

demo_data <- demo_data %>% 
  mutate(date_report = ymd(date_report),
         date_inj = ymd(date_inj))

hist_2021 <- hist_2021 %>% 
  mutate(date = ymd(date))

hist_5yr <- hist_5yr %>% 
  mutate(date = ymd(date))

# create the incidence object, aggregating cases by day
epicurve <- incidence(        # create incidence object
  x = demo_data,             # dataset
  date_index = date_inj,     # date column
  interval = "day"           # date grouping interval
)

# plot
plot(epicurve,
     fill = "navyblue",
     date_format = "%d",
     xlab = "Date of injury",
     ylab = "No. of cases",
     legend = "right") +
  
  # add a line for 5-year historical daily case average
  geom_line(data = hist_5yr, mapping = aes(x = date, y = n_5yr, color = "red")) +
  
  # add a line for 2021 daily cases
  geom_line(data = hist_2021, mapping = aes(x = date, y = n_2021, color = "black")) +
  
  # theme to remove gridlines
  theme_classic() +
  
  # add color labels (however this currently only includes the lines, not the histogram)
  scale_color_manual(
    values = c("black", "red", "navyblue"),
    labels = c("2021", "5-year average", "2022")) +
  
  # adjust x axis labels
  scale_x_date(date_breaks = "day",
               labels = label_date_short()) # minimal x-axis information
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.

Created on 2023-01-10 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31)
#>  os       macOS Ventura 13.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Asia/Manila
#>  date     2023-01-10
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
#>  backports       1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
#>  broom           1.0.1   2022-08-29 [1] CRAN (R 4.2.0)
#>  cellranger      1.1.0   2016-07-27 [1] CRAN (R 4.2.0)
#>  cli             3.4.1   2022-09-23 [1] CRAN (R 4.2.0)
#>  colorspace      2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon          1.5.2   2022-09-29 [1] CRAN (R 4.2.0)
#>  curl            4.3.3   2022-10-06 [1] CRAN (R 4.2.0)
#>  data.table      1.14.4  2022-10-17 [1] CRAN (R 4.2.0)
#>  DBI             1.1.3   2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr          2.2.1   2022-06-27 [1] CRAN (R 4.2.0)
#>  digest          0.6.30  2022-10-18 [1] CRAN (R 4.2.0)
#>  dplyr         * 1.0.10  2022-09-01 [1] CRAN (R 4.2.0)
#>  ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate        0.18    2022-11-07 [1] CRAN (R 4.2.0)
#>  fansi           1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  farver          2.1.1   2022-07-06 [1] CRAN (R 4.2.0)
#>  fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats       * 0.5.2   2022-08-19 [1] CRAN (R 4.2.0)
#>  fs              1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  gargle          1.2.1   2022-09-08 [1] CRAN (R 4.2.0)
#>  generics        0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2       * 3.4.0   2022-11-04 [1] CRAN (R 4.2.0)
#>  glue            1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  googledrive     2.0.0   2021-07-08 [1] CRAN (R 4.2.0)
#>  googlesheets4   1.0.1   2022-08-13 [1] CRAN (R 4.2.0)
#>  gtable          0.3.1   2022-09-01 [1] CRAN (R 4.2.0)
#>  haven           2.5.1   2022-08-22 [1] CRAN (R 4.2.0)
#>  highr           0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  hms             1.1.2   2022-08-19 [1] CRAN (R 4.2.0)
#>  htmltools       0.5.3   2022-07-18 [1] CRAN (R 4.2.0)
#>  httr            1.4.4   2022-08-17 [1] CRAN (R 4.2.0)
#>  incidence2    * 1.2.3   2021-11-07 [1] CRAN (R 4.2.0)
#>  jsonlite        1.8.3   2022-10-21 [1] CRAN (R 4.2.0)
#>  knitr           1.40    2022-08-24 [1] CRAN (R 4.2.0)
#>  labeling        0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
#>  lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
#>  lubridate     * 1.9.0   2022-11-06 [1] CRAN (R 4.2.0)
#>  magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  mime            0.12    2021-09-28 [1] CRAN (R 4.2.0)
#>  modelr          0.1.10  2022-11-11 [1] CRAN (R 4.2.0)
#>  munsell         0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  pacman          0.5.1   2019-03-11 [1] CRAN (R 4.2.0)
#>  pillar          1.8.1   2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         * 0.3.5   2022-10-06 [1] CRAN (R 4.2.0)
#>  R6              2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  readr         * 2.1.3   2022-10-01 [1] CRAN (R 4.2.0)
#>  readxl          1.4.1   2022-08-17 [1] CRAN (R 4.2.0)
#>  reprex        * 2.0.2   2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang           1.0.6   2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown       2.18    2022-11-09 [1] CRAN (R 4.2.0)
#>  rstudioapi      0.14    2022-08-22 [1] CRAN (R 4.2.0)
#>  rvest           1.0.3   2022-08-19 [1] CRAN (R 4.2.0)
#>  scales        * 1.2.1   2022-08-20 [1] CRAN (R 4.2.0)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi         1.7.8   2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr       * 1.4.1   2022-08-20 [1] CRAN (R 4.2.0)
#>  tibble        * 3.1.8   2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyr         * 1.2.1   2022-09-08 [1] CRAN (R 4.2.0)
#>  tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse     * 1.3.2   2022-07-18 [1] CRAN (R 4.2.0)
#>  timechange    * 0.1.1   2022-11-04 [1] CRAN (R 4.2.0)
#>  tzdb            0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8            1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs           0.5.0   2022-10-22 [1] CRAN (R 4.2.0)
#>  withr           2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun            0.34    2022-10-18 [1] CRAN (R 4.2.0)
#>  xml2            1.3.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml            2.3.6   2022-10-18 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
1 Like

Hello,

I just wanted to clarify a few items with you to make sure I understand.

For #1, do you have time on a 12- or 24-hour clock?

For #2, is the data for the trend lines defined with respect to the same surveillance day definition used in #1 or is it based on a normal day? Does this data come already aggregated or do you need to do the aggregation?

All the best,

Tim

Hi Tim, thanks again for responding.

For #1, time is on a 24-hour clock.

For #2, it is based on a normal day. Data for the trend lines are already aggregated.

Thank you!

1 Like

Hi,

I think it will be difficult to plot these trends on the same plot since #1 and #2 are defining data with a different underlying unit of time. #1 is using a definition that spans two days while #2 is using a traditional day. Is there a particular reason you need to define time in #1 this way or could you use normal days to match definitions? I think this would also make the comparison more meaningful.

All the best,

Tim

Hi Tim,

#1 will not actually be used for plotting. It will only be used to count the number of cases reported for a particular period (referencing β€œdate_report” and β€œtime_report” variables in the β€œdemo_data” df), for example in the report we state that β€œxx number of cases has been reported from 6:00AM of 1 January 2023 to 5:59AM of 2 January 2023.”

#2 for plotting, the histogram of current cases as well as the 2 trend lines for historical cases will be plotted based on the actual date of injury (referencing β€œdate_inj” variable in the β€œdemo_data” df and β€œdate” variable in the β€œhist_2021” and β€œhist_5yr” dfs).

I hope that helps clarify my question. Thank you.

1 Like

Hello,

Thank you for clarifying! For #2, you are trying to plot a fill aesthetic and two colour aesthetics which is what is causing issues. ggplot2 will generally put these on different legends since they are different aesthetics. Using manual scales and some overriding, you can get something close to what you desire:

library(tidyverse)
library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# generate demo data

demo_data <- data.frame(
    pno = c(3L,8L,10L,11L,18L,19L,20L,21L,23L,24L),
    date_report = c("2022-12-21","2022-12-23","2022-12-23","2022-12-24","2022-12-25",
                                    "2022-12-25","2022-12-25","2022-12-25", "2022-12-25","2022-12-25"),
    time_report = c("23:50:00","16:15:28","21:25:02","02:57:25","01:51:04","02:55:10",
                                    "04:14:59","04:37:04","05:49:53","05:51:52"),
    date_inj = c("2022-12-21","2022-12-23","2022-12-23","2022-12-23","2022-12-25",
                             "2022-12-25","2022-12-25","2022-12-25","2022-12-22","2022-12-25"),
    time_inj = c("15:00:00","14:40:00","01:00:00","14:00:00","00:30:00","00:00:00",
                             "00:00:00","00:20:00","14:00:00","00:30:00"))

hist_2021 <- data.frame(
    date = c("2022-12-21","2022-12-22","2022-12-23","2022-12-24","2022-12-25","2022-12-26",
                     "2022-12-27","2022-12-28","2022-12-29","2022-12-30","2022-12-31","2023-01-01",
                     "2023-01-02","2023-01-03","2023-01-04","2023-01-05","2023-01-06"),
    n_2021 = c(0L,1L,3L,4L,9L,4L,2L,2L,2L,10L,39L,106L,3L,1L,2L,1L,0L))

hist_5yr <- data.frame(
    date = c("2022-12-21","2022-12-22","2022-12-23","2022-12-24","2022-12-25","2022-12-26",
                     "2022-12-27","2022-12-28","2022-12-29","2022-12-30","2022-12-31","2023-01-01",
                     "2023-01-02","2023-01-03","2023-01-04","2023-01-05","2023-01-06"),
    n_5yr = c(3L,2L,3L,8L,11L,6L,8L,4L,4L,10L,74L,167L,9L,5L,3L,1L,0L))


clean_demo_data <- demo_data |>
    as_tibble() |>
    mutate(
        date_report = ymd(date_report),
        time_report = hms(time_report),
        date_inj = ymd(date_inj),
        time_inj = hms(time_inj)
    )

clean_hist_2021 <-
    hist_2021 |>
    as_tibble() |>
    mutate(date = ymd(date))

clean_hist_5yr <-
    hist_5yr |>
    as_tibble() |>
    mutate(date = ymd(date))
    
clean_demo_data |>
    count(date_inj) |>
    ggplot(mapping = aes(x = date_inj,
                                             y = n,
                                             colour = "2022")) +
    geom_col(fill = "navyblue") +
    geom_line(data = clean_hist_2021,
                        mapping = aes(x = date,
                                                    y = n_2021,
                                                    colour = "2021")) +
    geom_line(data = clean_hist_5yr,
                        mapping = aes(x = date, y = n_5yr, colour = "5-year average")) +
    scale_x_date(breaks = scales::pretty_breaks(),
                             labels = scales::date_format()) +
    scale_y_continuous(breaks = scales::extended_breaks(),
                                         labels = scales::comma_format()) +
    scale_colour_manual(
        values = c("navyblue", "black", "red"),
        breaks = c("2022", "2021", "5-year average"),
        guide = guide_legend(override.aes = list(fill = NA))
    ) +
    labs(x = "\nDate of Injury",
             y = "Number of Injuries\n",
             colour = NULL) +
    theme_minimal()

Created on 2023-01-11 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Toronto
#>  date     2023-01-11
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
#>  backports       1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
#>  broom           1.0.2   2022-12-15 [1] CRAN (R 4.2.0)
#>  cellranger      1.1.0   2016-07-27 [1] CRAN (R 4.2.0)
#>  cli             3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  colorspace      2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon          1.5.2   2022-09-29 [1] CRAN (R 4.2.0)
#>  curl            4.3.3   2022-10-06 [1] CRAN (R 4.2.0)
#>  DBI             1.1.3   2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr          2.2.1   2022-06-27 [1] CRAN (R 4.2.1)
#>  digest          0.6.31  2022-12-11 [1] CRAN (R 4.2.0)
#>  dplyr         * 1.0.10  2022-09-01 [1] RSPM (R 4.2.1)
#>  ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate        0.19    2022-12-13 [1] CRAN (R 4.2.2)
#>  fansi           1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  farver          2.1.1   2022-07-06 [1] CRAN (R 4.2.0)
#>  fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats       * 0.5.2   2022-08-19 [1] RSPM (R 4.2.1)
#>  fs              1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  gargle          1.2.1   2022-09-08 [1] RSPM (R 4.2.1)
#>  generics        0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2       * 3.4.0   2022-11-04 [1] CRAN (R 4.2.0)
#>  glue            1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  googledrive     2.0.0   2021-07-08 [1] CRAN (R 4.2.0)
#>  googlesheets4   1.0.1   2022-08-13 [1] CRAN (R 4.2.1)
#>  gtable          0.3.1   2022-09-01 [1] RSPM (R 4.2.1)
#>  haven           2.5.1   2022-08-22 [1] RSPM (R 4.2.1)
#>  highr           0.10    2022-12-22 [1] CRAN (R 4.2.0)
#>  hms             1.1.2   2022-08-19 [1] RSPM (R 4.2.1)
#>  htmltools       0.5.4   2022-12-07 [1] CRAN (R 4.2.0)
#>  httr            1.4.4   2022-08-17 [1] RSPM (R 4.2.1)
#>  jsonlite        1.8.4   2022-12-06 [1] CRAN (R 4.2.0)
#>  knitr           1.41    2022-11-18 [1] CRAN (R 4.2.0)
#>  labeling        0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
#>  lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.2.1)
#>  lubridate     * 1.9.0   2022-11-06 [1] CRAN (R 4.2.0)
#>  magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  mime            0.12    2021-09-28 [1] CRAN (R 4.2.0)
#>  modelr          0.1.10  2022-11-11 [1] CRAN (R 4.2.0)
#>  munsell         0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  pillar          1.8.1   2022-08-19 [1] RSPM (R 4.2.1)
#>  pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache         0.16.0  2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3     1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo            1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils         2.12.2  2022-11-11 [1] CRAN (R 4.2.0)
#>  R6              2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  readr         * 2.1.3   2022-10-01 [1] CRAN (R 4.2.0)
#>  readxl          1.4.1   2022-08-17 [1] RSPM (R 4.2.1)
#>  reprex          2.0.2   2022-08-17 [1] RSPM (R 4.2.1)
#>  rlang           1.0.6   2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown       2.19    2022-12-15 [1] CRAN (R 4.2.0)
#>  rstudioapi      0.14    2022-08-22 [1] RSPM (R 4.2.1)
#>  rvest           1.0.3   2022-08-19 [1] RSPM (R 4.2.1)
#>  scales          1.2.1   2022-08-20 [1] RSPM (R 4.2.1)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi         1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr       * 1.5.0   2022-12-02 [1] CRAN (R 4.2.0)
#>  styler          1.8.1   2022-11-07 [1] CRAN (R 4.2.0)
#>  tibble        * 3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
#>  tidyr         * 1.2.1   2022-09-08 [1] RSPM (R 4.2.1)
#>  tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse     * 1.3.2   2022-07-18 [1] CRAN (R 4.2.0)
#>  timechange    * 0.1.1   2022-11-04 [1] CRAN (R 4.2.0)
#>  tzdb            0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8            1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs           0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
#>  withr           2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun            0.36    2022-12-21 [1] CRAN (R 4.2.0)
#>  xml2            1.3.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml            2.3.6   2022-10-18 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/timothychisamore/Library/R/x86_64/4.2/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

For #1, I would do something like this - note, this is a very brute force method so it quickly becomes unsustainable if you try to generalize this to a longer period of time:

library(tidyverse)
library(lubridate)
#> Loading required package: timechange
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# generate demo data

demo_data <- data.frame(
    pno = c(3L,8L,10L,11L,18L,19L,20L,21L,23L,24L),
    date_report = c("2022-12-21","2022-12-23","2022-12-23","2022-12-24","2022-12-25",
                                    "2022-12-25","2022-12-25","2022-12-25", "2022-12-25","2022-12-25"),
    time_report = c("23:50:00","16:15:28","21:25:02","02:57:25","01:51:04","02:55:10",
                                    "04:14:59","04:37:04","05:49:53","05:51:52"),
    date_inj = c("2022-12-21","2022-12-23","2022-12-23","2022-12-23","2022-12-25",
                             "2022-12-25","2022-12-25","2022-12-25","2022-12-22","2022-12-25"),
    time_inj = c("15:00:00","14:40:00","01:00:00","14:00:00","00:30:00","00:00:00",
                             "00:00:00","00:20:00","14:00:00","00:30:00"))

clean_demo_data <- demo_data |>
    as_tibble() |>
    mutate(
        date_report = ymd(date_report),
        time_report = hms(time_report),
        date_inj = ymd(date_inj),
        time_inj = hms(time_inj)
    )

clean_demo_data |>
    mutate(
        datetime_report = parse_date_time(
            x = paste(date_report, time_report, sep = " "),
            orders = "%Y-%m-%d %H:%M:%s"
        ),
        surv_day = case_when(
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-21 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-22 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 1L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-22 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-23 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 2L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-23 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-24 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 3L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-24 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-25 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 4L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-25 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-26 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 5L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-26 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-27 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 6L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-27 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-28 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 7L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-28 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-29 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 8L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-29 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-30 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 9L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-30 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2022-12-31 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 10L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2022-12-31 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2023-01-01 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 11L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2023-01-01 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2023-01-02 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 12L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2023-01-02 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2023-01-03 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 13L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2023-01-03 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2023-01-04 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 14L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2023-01-04 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2023-01-05 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 15L,
            datetime_report %within% interval(
                start = parse_date_time(x = "2023-01-05 06:00:00", orders = "%Y-%m-%d %H:%M:%S"),
                end = parse_date_time(x = "2023-01-06 5:59:00", orders = "%Y-%m-%d %H:%M:%S")
            ) ~ 16L,
            TRUE ~ NA_integer_
        )
    ) |>
    count(surv_day)
#> # A tibble: 3 Γ— 2
#>   surv_day     n
#>      <int> <int>
#> 1        1     1
#> 2        3     3
#> 3        4     6

Created on 2023-01-11 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Toronto
#>  date     2023-01-11
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
#>  backports       1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
#>  broom           1.0.2   2022-12-15 [1] CRAN (R 4.2.0)
#>  cellranger      1.1.0   2016-07-27 [1] CRAN (R 4.2.0)
#>  cli             3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  colorspace      2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon          1.5.2   2022-09-29 [1] CRAN (R 4.2.0)
#>  DBI             1.1.3   2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr          2.2.1   2022-06-27 [1] CRAN (R 4.2.1)
#>  digest          0.6.31  2022-12-11 [1] CRAN (R 4.2.0)
#>  dplyr         * 1.0.10  2022-09-01 [1] RSPM (R 4.2.1)
#>  ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate        0.19    2022-12-13 [1] CRAN (R 4.2.2)
#>  fansi           1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats       * 0.5.2   2022-08-19 [1] RSPM (R 4.2.1)
#>  fs              1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  gargle          1.2.1   2022-09-08 [1] RSPM (R 4.2.1)
#>  generics        0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2       * 3.4.0   2022-11-04 [1] CRAN (R 4.2.0)
#>  glue            1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  googledrive     2.0.0   2021-07-08 [1] CRAN (R 4.2.0)
#>  googlesheets4   1.0.1   2022-08-13 [1] CRAN (R 4.2.1)
#>  gtable          0.3.1   2022-09-01 [1] RSPM (R 4.2.1)
#>  haven           2.5.1   2022-08-22 [1] RSPM (R 4.2.1)
#>  highr           0.10    2022-12-22 [1] CRAN (R 4.2.0)
#>  hms             1.1.2   2022-08-19 [1] RSPM (R 4.2.1)
#>  htmltools       0.5.4   2022-12-07 [1] CRAN (R 4.2.0)
#>  httr            1.4.4   2022-08-17 [1] RSPM (R 4.2.1)
#>  jsonlite        1.8.4   2022-12-06 [1] CRAN (R 4.2.0)
#>  knitr           1.41    2022-11-18 [1] CRAN (R 4.2.0)
#>  lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.2.1)
#>  lubridate     * 1.9.0   2022-11-06 [1] CRAN (R 4.2.0)
#>  magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  modelr          0.1.10  2022-11-11 [1] CRAN (R 4.2.0)
#>  munsell         0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  pillar          1.8.1   2022-08-19 [1] RSPM (R 4.2.1)
#>  pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache         0.16.0  2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3     1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo            1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils         2.12.2  2022-11-11 [1] CRAN (R 4.2.0)
#>  R6              2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  readr         * 2.1.3   2022-10-01 [1] CRAN (R 4.2.0)
#>  readxl          1.4.1   2022-08-17 [1] RSPM (R 4.2.1)
#>  reprex          2.0.2   2022-08-17 [1] RSPM (R 4.2.1)
#>  rlang           1.0.6   2022-09-24 [1] CRAN (R 4.2.0)
#>  rmarkdown       2.19    2022-12-15 [1] CRAN (R 4.2.0)
#>  rstudioapi      0.14    2022-08-22 [1] RSPM (R 4.2.1)
#>  rvest           1.0.3   2022-08-19 [1] RSPM (R 4.2.1)
#>  scales          1.2.1   2022-08-20 [1] RSPM (R 4.2.1)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi         1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr       * 1.5.0   2022-12-02 [1] CRAN (R 4.2.0)
#>  styler          1.8.1   2022-11-07 [1] CRAN (R 4.2.0)
#>  tibble        * 3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
#>  tidyr         * 1.2.1   2022-09-08 [1] RSPM (R 4.2.1)
#>  tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse     * 1.3.2   2022-07-18 [1] CRAN (R 4.2.0)
#>  timechange    * 0.1.1   2022-11-04 [1] CRAN (R 4.2.0)
#>  tzdb            0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8            1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs           0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
#>  withr           2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun            0.36    2022-12-21 [1] CRAN (R 4.2.0)
#>  xml2            1.3.3   2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml            2.3.6   2022-10-18 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/timothychisamore/Library/R/x86_64/4.2/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

All the best,

Tim

Thanks, Tim! A couple of questions on this code:

  1. Is it necessary to use as_tibble() like how you cleaned the 3 datasets? What advantage/use does it confer over the usual data frame?

  2. In this epicurve, I noticed that you went straight to ggplot rather than using the incidence2 package. Does this mean that the incidence2 package cannot be integrated into the subsequent approach of adding manual scales and overriding?

1 Like

Hi Tim, I encountered the following error when I tried running this code:

Error in .fun():
! Problem while computing surv_day = case_when(...).
Caused by error in h():
! error in evaluating the argument β€˜b’ in selecting a method for function β€˜%within%’: unused arguments (start = parse_date_time(x = β€œ2022-12-21 06:00:00”, orders = β€œ%Y-%m-%d %H:%M:%S”), end = parse_date_time(x = β€œ2022-12-22 5:59:00”, orders = β€œ%Y-%m-%d %H:%M:%S”))
Run rlang::last_error() to see where the error occurred.

1 Like

Hi,

  1. as_tibble() is not necessary, I just prefer how tibbles print in the RStudio IDE, this is helpful to me during exploratory data analysis (EDA).
  2. You most likely could still use the incidence2 package, I just don’t see the need to do so given you are just plotting daily counts which can be achieved easily with a count() function.

All the best,

Tim

Hi,

It’s hard to know exactly what is causing the issue but I would guess you either don’t have the lubridate package installed, you didn’t run library(lubridate), or your version of the package is older. Your best bet would be to ensure you have an up to date version of the package.

All the best,

Tim