Dheeraj Sudan and Meenu Hinduja-What is the best way to debug R code used in outbreak analysis?

Hi everyone, I’m Dheeraj Sudan from the UK. I’m a software developer and also run a business with my wife Meenu Hinduja. I’m working on debugging some R code for outbreak analysis and wanted to ask how you usually approach it. Do you have any preferred methods or tools for spotting issues that you’d recommend?

Regards

Dheeraj Sudan and Meenu Hinduja

Welcome, Dheeraj!

I want to open the discussion here, there are multiple ways of addressing this.

For outbreak analysis in R, I’d recommend debugging in a structured way:

  1. Small reproducible example
    Use {reprex} and a small sample of data. This helps isolate whether the problem is in the code, the data, or the package behavior. We have a free online tutorial on reprex here!

  2. Check the data early
    Most outbreak analysis bugs come from dates, missing values, duplicated IDs, or unexpected categories.

skimr::skim(linelist)
janitor::tabyl(linelist, sex)
summary(linelist$date_onset)

  1. Inspect each pipeline step
    Break long pipes into smaller objects:
step1 <- linelist %>% clean_names()
step2 <- step1 %>% mutate(date_onset = lubridate::ymd(date_onset))
step3 <- step2 %>% count(date_onset)

  1. Use RStudio debugging tools
    Useful tools include:
traceback()
rlang::last_trace()
browser()
debugonce(your_function)
?any_function

For tidyverse errors, rlang::last_trace() is especially helpful.

  1. Validate outputs against expectations
    For outbreak work, I usually check:
  • Are dates parsed correctly?

  • Are case counts plausible?

  • Are duplicates handled?

  • Are denominators correct?

  • Do totals match source reports?

The Epi R Handbook has very useful chapters on troubleshooting, cleaning data, dates, and epidemic curves.

A good workflow is: make the dataset smaller, clean names, check variable types, run one pipe step at a time, and only then rebuild the full analysis.

Hope this is helpful, open discussion!

Best,

Luis