Clean and check onset dates in R

Hi all, I’m working on cleaning some surveillance data. Here’s a small reproducible example — does this look good for checking date formats and ranges?

# install and load packages
pacman::p_load(
  rio, 
  janitor,
  here,
  lubridate, 
  tidyverse, 
  datapasta,
  reprex
)

# clean the surveillance data
demo_data <- data.frame(
  stringsAsFactors = FALSE,
  case_id = c("694928", "86340d", "92d002", "544bd1", "6056ba"),
  onset_date = c("11/9/2014", "10/30/2014", "8/16/2014", "8/29/2014", "10/20/2014"),
  sex = c("m", "f", "f", "f", "f")
)

demo_clean <- demo_data %>% 
  rename(date_onset = onset_date) %>% 
  mutate(date_onset = mdy(date_onset))  # use mdy() since your dates are in month/day/year format

# check class and range of date column
class(demo_clean$date_onset)
#> [1] "Date"
range(demo_clean$date_onset)
#> [1] "2014-08-16" "2014-11-09"

thanks all

1 Like

Hey! Your example looks great, it’s a clean and reproducible way to verify that your onset_date values are being parsed correctly and fall within the expected range.

A few quick notes:

  • You correctly used mdy() because your dates are in month/day/year format.

  • class() confirms that date_onset is now a proper Date object.

  • range() gives a quick check for outliers or unexpected years

You could also add a simple validation step to catch any parsing failures or impossible dates:

sum(is.na(demo_clean$date_onset))  # count any unparsed dates
summary(demo_clean$date_onset)     # min, median, max at a glance

Best,

Luis

1 Like