Hello Applied Epi community
I am trying to clean my data and facing issues and not able to find help in online resources. The data is entered such that the variable has several responses entered in each row (separated by commas).
I am trying to split these responses into several columns that should have one response each. But my script is causing a problem.
I need help to move the newly created split variables (after split) orderly in each column.
For example, I need all entries labeled “6”, “7”, or “11” in one respective column each. A column should have 6 only (if it is entered for that subject and the remaining in that column should be NAs). The same goes for the next column of 7 or 11.
Create dummy variables (binary)
The reason I want this is to create dummy variables based on entries (either 6, 7, or 11 in each column).
Is it possible to pipe into a script here so that a 0, 1 (binary/dummy) variable gets created within the same script?
I am sharing the reprex below, if you can look into for help please.
pacman::p_load(rio, lubridate, datapasta, reprex, tidyverse)
df_test <- tibble::tribble(~study_no, ~table2_major_symptoms,1L,"11",2L,"11",5L,"1,6,11",6L,"6,7,11",7L,"7,8,11",15L,"14 (psoas abscess)")
# Split variable --------------------------------------------------------
df_test <- df_test %>% separate_wider_delim(`table2_major_symptoms` ,
delim = "," ,
names = c("a" , "b" , "c" , "d") ,
too_few = "align_start",
too_many = "merge",
cols_remove = FALSE
)
Created on 2023-10-14 with reprex v2.0.2
Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.utf8
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.utf8
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Norwegian Bokmål_Norway.utf8
#>
#> time zone: Europe/Paris
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1
#> [5] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2
#> [9] tidyverse_2.0.0 reprex_2.0.2 datapasta_3.1.0 lubridate_1.9.2
#> [13] rio_0.5.29
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.3 generics_0.1.3 stringi_1.7.12 hms_1.1.3
#> [5] digest_0.6.33 magrittr_2.0.3 evaluate_0.21 grid_4.3.1
#> [9] timechange_0.2.0 fastmap_1.1.1 cellranger_1.1.0 zip_2.3.0
#> [13] fansi_1.0.4 scales_1.2.1 cli_3.6.1 rlang_1.1.1
#> [17] munsell_0.5.0 withr_2.5.0 yaml_2.3.7 tools_4.3.1
#> [21] tzdb_0.4.0 colorspace_2.1-0 pacman_0.5.1 curl_5.0.1
#> [25] vctrs_0.6.3 R6_2.5.1 lifecycle_1.0.3 fs_1.6.3
#> [29] foreign_0.8-84 pkgconfig_2.0.3 pillar_1.9.0 openxlsx_4.2.5.2
#> [33] gtable_0.3.3 data.table_1.14.8 glue_1.6.2 Rcpp_1.0.11
#> [37] haven_2.5.3 xfun_0.39 tidyselect_1.2.0 rstudioapi_0.15.0
#> [41] knitr_1.43 htmltools_0.5.5 rmarkdown_2.23 compiler_4.3.1
#> [45] readxl_1.4.3