Hi everyone, I’m Dheeraj Sudan from the UK. I’m a software developer and also run a business with my wife Meenu Hinduja. I’m working on debugging some R code for outbreak analysis and wanted to ask how you usually approach it. Do you have any preferred methods or tools for spotting issues that you’d recommend?
Regards
Dheeraj Sudan and Meenu Hinduja
Welcome, Dheeraj!
I want to open the discussion here, there are multiple ways of addressing this.
For outbreak analysis in R, I’d recommend debugging in a structured way:
-
Small reproducible example
Use {reprex} and a small sample of data. This helps isolate whether the problem is in the code, the data, or the package behavior. We have a free online tutorial on reprex here!
-
Check the data early
Most outbreak analysis bugs come from dates, missing values, duplicated IDs, or unexpected categories.
skimr::skim(linelist)
janitor::tabyl(linelist, sex)
summary(linelist$date_onset)
- Inspect each pipeline step
Break long pipes into smaller objects:
step1 <- linelist %>% clean_names()
step2 <- step1 %>% mutate(date_onset = lubridate::ymd(date_onset))
step3 <- step2 %>% count(date_onset)
- Use RStudio debugging tools
Useful tools include:
traceback()
rlang::last_trace()
browser()
debugonce(your_function)
?any_function
For tidyverse errors, rlang::last_trace() is especially helpful.
- Validate outputs against expectations
For outbreak work, I usually check:
-
Are dates parsed correctly?
-
Are case counts plausible?
-
Are duplicates handled?
-
Are denominators correct?
-
Do totals match source reports?
The Epi R Handbook has very useful chapters on troubleshooting, cleaning data, dates, and epidemic curves.
A good workflow is: make the dataset smaller, clean names, check variable types, run one pipe step at a time, and only then rebuild the full analysis.
Hope this is helpful, open discussion!
Best,
Luis