Converting dates with times (handbook chapter 9)

Thank you for posting! Here is an outline of an effective post:

Describe your issue

Hello! I’m going through chapter 9 in the handbook. The instructions in part 9.7 about working with date-time class mentions that a “clean” time of admission column with missing values filled-in with the column median should be made because lubridate won’t operate on missing values. However, upon running the given code, it doesn’t get the median of time_admission:

# packages
pacman::p_load(tidyverse, lubridate, stringr)

# time_admission is a column in hours:minutes
linelist <- linelist %>%
  
  # when time of admission is not given, assign the median admission time
  mutate(
    time_admission_clean = ifelse(
      is.na(time_admission),         # if time is missing
      median(time_admission),        # assign the median
      time_admission                 # if not missing keep as is
  ) %>%
  
    # use str_glue() to combine date and time columns to create one character column
    # and then use ymd_hm() to convert it to datetime
  mutate(
    date_time_of_admission = str_glue("{date_hospitalisation} {time_admission_clean}") %>% 
      ymd_hm()
  )

linelist %>% select(date_hospitalisation, time_admission_clean, date_time_of_admission) %>% head(10)

Output:

   date_hospitalisation time_admission_clean date_time_of_admission
1            2014-05-15                 <NA>                   <NA>
2            2014-05-14                09:36    2014-05-14 09:36:00
3            2014-05-18                16:48    2014-05-18 16:48:00
4            2014-05-20                11:22    2014-05-20 11:22:00
5            2014-05-22                12:60                   <NA>
6            2014-05-23                14:13    2014-05-23 14:13:00
7            2014-05-29                14:33    2014-05-29 14:33:00
8            2014-06-03                09:25    2014-06-03 09:25:00
9            2014-06-06                11:16    2014-06-06 11:16:00
10           2014-06-07                10:55    2014-06-07 10:55:00

I think it is because the time_admission column is a character class. I tried to mutate it to numeric with the following code but I got all NA’s instead:

linelist <- linelist %>%
  mutate(time_admission = as.numeric(time_admission)) %>% 
  
  # when time of admission is not given, assign the median admission time
  mutate(
    time_admission_clean = ifelse(
      is.na(time_admission),         # if time is missing
      median(time_admission),        # assign the median
      time_admission                 # if not missing keep as is
    )) %>%
      
      # use str_glue() to combine date and time columns to create one character column
      # and then use ymd_hm() to convert it to datetime
      mutate(
        date_time_of_admission = str_glue("{date_hospitalisation} {time_admission_clean}") %>% 
          ymd_hm()
      )

Output:

   date_hospitalisation time_admission time_admission_clean date_time_of_admission
1            2014-05-15             NA                   NA                   <NA>
2            2014-05-14             NA                   NA                   <NA>
3            2014-05-18             NA                   NA                   <NA>
4            2014-05-20             NA                   NA                   <NA>
5            2014-05-22             NA                   NA                   <NA>
6            2014-05-23             NA                   NA                   <NA>
7            2014-05-29             NA                   NA                   <NA>
8            2014-06-03             NA                   NA                   <NA>
9            2014-06-06             NA                   NA                   <NA>
10           2014-06-07             NA                   NA                   <NA>
1 Like

Hi Ian,

Can you provide data so that we can reproduce your error?

All the best,

Tim

1 Like

Hi Tim,

I’m using the handbook’s COVID-19 linelist here.

Thanks.

1 Like

@iancgmd I think you may have identified a mistake in the Epi R Handbook. Could you please post this as an issue in the Handbook’s github?

Much appreciated

1 Like

Hi Ian,

This appears to be an issue with the data as Neale mentioned above, the admission time in the raw dataset is 12:60 which is not a valid value which results in your N/A datetime of admission.

As an example:

# Loading packages
library(tibble)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

# Creating fake data
data <- tribble(
    ~date, ~time,
    "2014-05-22", "12:59",
    "2014-05-22", "12:60",
    "2014-05-22", "13:00"
)

# Deriving datetime
data |>
    mutate(datetime = ymd_hm(str_glue("{date} {time}")))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `datetime = ymd_hm(str_glue("{date} {time}"))`.
#> Caused by warning:
#> !  1 failed to parse.
#> # A tibble: 3 × 3
#>   date       time  datetime           
#>   <chr>      <chr> <dttm>             
#> 1 2014-05-22 12:59 2014-05-22 12:59:00
#> 2 2014-05-22 12:60 NA                 
#> 3 2014-05-22 13:00 2014-05-22 13:00:00

Created on 2023-09-10 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] stringr_1.5.0   dplyr_1.1.2     lubridate_1.9.2 tibble_3.2.1   
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.3       cli_3.6.1         knitr_1.43        rlang_1.1.1      
#>  [5] xfun_0.39         stringi_1.7.12    purrr_1.0.1       styler_1.10.1    
#>  [9] generics_0.1.3    glue_1.6.2        htmltools_0.5.5   fansi_1.0.4      
#> [13] rmarkdown_2.23    R.cache_0.16.0    evaluate_0.21     fastmap_1.1.1    
#> [17] yaml_2.3.7        lifecycle_1.0.3   compiler_4.3.1    fs_1.6.3         
#> [21] timechange_0.2.0  pkgconfig_2.0.3   rstudioapi_0.15.0 R.oo_1.25.0      
#> [25] R.utils_2.12.2    digest_0.6.33     R6_2.5.1          tidyselect_1.2.0 
#> [29] reprex_2.0.2      utf8_1.2.3        pillar_1.9.0      magrittr_2.0.3   
#> [33] R.methodsS3_1.8.2 tools_4.3.1       withr_2.5.0

All the best,

Tim

1 Like

Hi Neale,

Sorry for the late response. I have created an issue through the github link provided.

Ian

1 Like