Hi all,
Issue: I am a new user to R. Trying to create a sitrep for measles data from 2019 onwards. During the COVID-19 years there were smaller numbers and in one year (2021) no cases. Using ggplot to create an epi curve by year (or quarter or month or weeks) works, but only plots those time points where cases were reported. Help needed to outline the most streamlined method to do this moving forward, including packages.
Timeline/urgency - Not urgent
The current method we are using is
Aggregate cases by time period i.e. cases over year_diagnosed and state.
2 Create a longer data set of all possible outcomes using the tidyr::expand function
3 Merge the two datasets.
count cases by month
cases_year_raw ← measles_NNDSS_2019 %>%
count(year_diagnosed, state) %>%
arrange(year_diagnosed)
create longer dataset of all possible months, year and state
cases_year_expanded ← cases_year_raw %>%
rbind(data.frame(
year_diagnosed = “2021”,
state = “QLD”,
n = 0
)) %>%
tidyr::expand(year_diagnosed, state)
merge so that all months, year and state
cases_year ← cases_year_raw %>%
right_join(cases_year_expanded, by = join_by(year_diagnosed, state)) %>%
mutate(n = replace_na(n, 0),
year_diagnosed = fct_relevel(year_diagnosed, as.character(2019:2024))) %>%
arrange(year_diagnosed) %>%
mutate(state = factor(state, levels = unique(state)))
Any assistance or suggestions would be appreciated.
1 Like
Hello,
Here is how I would approach this problem using some of the functions from tidyverse
:
# loading packages
library(tidyverse)
# creating fake data
fake_data <- tibble(
episode_year = sample(
x = 2000L:2024L,
replace = TRUE,
size = 20
),
geo_unit = sample(
x = c("A", "B", "C", "D"),
replace = TRUE,
size = 20
)
)
# filling missing counts by episode_year and geo_unit
filled_data <- fake_data |>
count(episode_year, geo_unit) |>
complete(
episode_year = full_seq(episode_year, 1),
geo_unit = unique(geo_unit),
fill = list(n = 0)
)
# viewing the data
filled_data
#> # A tibble: 88 × 3
#> episode_year geo_unit n
#> <dbl> <chr> <int>
#> 1 2000 A 0
#> 2 2000 B 1
#> 3 2000 C 0
#> 4 2000 D 1
#> 5 2001 A 1
#> 6 2001 B 0
#> 7 2001 C 0
#> 8 2001 D 0
#> 9 2002 A 1
#> 10 2002 B 0
#> # ℹ 78 more rows
Created on 2024-08-13 with reprex v2.1.1
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os macOS Sonoma 14.5
#> system x86_64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Toronto
#> date 2024-08-13
#> pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
#> colorspace 2.1-0 2023-01-23 [1] RSPM (R 4.4.0)
#> digest 0.6.36 2024-06-23 [1] RSPM (R 4.4.0)
#> dplyr * 1.1.4 2023-11-17 [1] RSPM (R 4.4.0)
#> evaluate 0.24.0 2024-06-10 [1] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [1] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] RSPM (R 4.4.0)
#> forcats * 1.0.0 2023-01-29 [1] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [1] RSPM (R 4.4.0)
#> ggplot2 * 3.5.1 2024-04-23 [1] RSPM (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] RSPM (R 4.4.0)
#> gtable 0.3.5 2024-04-22 [1] RSPM (R 4.4.0)
#> hms 1.1.3 2023-03-21 [1] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#> knitr 1.48 2024-07-07 [1] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0)
#> lubridate * 1.9.3 2023-09-27 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.0)
#> munsell 0.5.1 2024-04-01 [1] RSPM (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [1] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.0)
#> purrr * 1.0.2 2023-08-10 [1] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [1] RSPM (R 4.4.0)
#> readr * 2.1.5 2024-01-10 [1] RSPM (R 4.4.0)
#> reprex 2.1.1 2024-07-06 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] RSPM (R 4.4.0)
#> rmarkdown 2.27 2024-05-17 [1] RSPM (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] RSPM (R 4.4.0)
#> scales 1.3.0 2023-11-28 [1] RSPM (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [1] RSPM (R 4.4.0)
#> stringr * 1.5.1 2023-11-14 [1] RSPM (R 4.4.0)
#> tibble * 3.2.1 2023-03-20 [1] RSPM (R 4.4.0)
#> tidyr * 1.3.1 2024-01-24 [1] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] RSPM (R 4.4.0)
#> tidyverse * 2.0.0 2023-02-22 [1] RSPM (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [1] RSPM (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [1] RSPM (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [1] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0)
#> withr 3.0.0 2024-01-16 [1] RSPM (R 4.4.0)
#> xfun 0.45 2024-06-16 [1] RSPM (R 4.4.0)
#> yaml 2.3.9 2024-07-05 [1] RSPM (R 4.4.0)
#>
#> [1] /Users/timothychisamore/Library/R/x86_64/4.4/library
#> [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
All the best,
Tim
neale
September 3, 2024, 12:38am
3
@cushlamcoffey Did the answer provided help you, or were you able to solve your issue another way?
Our volunteers spend their time writing answers, and others in the forum will learn from the discussion, so we would appreciate the feedback.
Thank you!
1 Like