Hello , my Problem is:converting onset_date column to Date in R
I am trying to convert the `onset_date` column in my dataset to Date type using:
install and load packages
pacman::p_load(janitor, tidyverse, reprex, datapasta, lubridate)
minimal dataset
surv_raw ← data.frame(
stringsAsFactors = FALSE,
onset_date = c(“2014-12-01”, “2014-12-02”, “invalid_date”, “32-12-2014”, “2014-13-05”)
)
try to convert column to Date
surv_clean ← surv_raw %>%
clean_names() %>%
mutate(onset_date = ymd(onset_date))
#> Warning: There was 1 warning in `mutate()`.
#>
In argument: `onset_date = ymd(onset_date)`.
#> Caused by warning:
#> ! 3 failed to parse.
check the cleaned date column
class(surv_clean$onset_date)
#> [1] “Date”
range(surv_clean$onset_date)
#> [1] NA NA
Created on 2026-03-11 with reprex v2.1.1
Hello,
In this case, the error message provided indicates that three onset dates failed to parse. You can see that there is one whose value is “invalid_date”, one whose value is the impossible date “32-12-2014”, and one whose format is presumably YDM rather than YMD “2014-13-05”.
You would need to address these issues prior to calling the ymd function or use a different approach that can handle these aberrant onset dates.
All the best,
Tim
Adding to Tim’s resopnse. Yes, ymd() is working as expected here: it successfully converts the valid dates and turns the invalid ones into NA.
In your example:
onset_date = c("2014-12-01", "2014-12-02", "invalid_date", "32-12-2014", "2014-13-05")
only the first two are valid YYYY-MM-DD dates.
So after:
mutate(onset_date = ymd(onset_date))
you get a Date column with some NAs.
Why range() returns NA NA
Because range() does not ignore missing values unless you tell it to:
range(surv_clean$onset_date)
# [1] NA NA
Use:
range(surv_clean$onset_date, na.rm = TRUE)
Example
pacman::p_load(dplyr,janitor,lubridate)
surv_raw <- data.frame(
stringsAsFactors = FALSE,
onset_date = c("2014-12-01", "2014-12-02", "invalid_date", "32-12-2014", "2014-13-05")
)
surv_clean <- surv_raw %>%
clean_names() %>%
mutate(onset_date = ymd(onset_date))
surv_clean
# onset_date
# 1 2014-12-01
# 2 2014-12-02
# 3 <NA>
# 4 <NA>
# 5 <NA>
class(surv_clean$onset_date)
# [1] "Date"
range(surv_clean$onset_date, na.rm = TRUE)
# [1] "2014-12-01" "2014-12-02"
Best,
Luis