I am working on a dataset on injuries. The dataset has 1 column each for the anatomic location of injury (analoc_eye, analoc_head, analoc_neck, analoc_chest, analoc_back, etc). The responses are coded either as YES or NO.
I’d like to create a new column (outcome_analoc_multiple) which codes as 1 if a case has 2 or more of the analoc_ columns with a YES, and 0 if there is only 1 analoc_ column with a YES. This column will eventually be used for logistic regression as an outcome (multiple anatomic locations of injuries).
I have thought about using ifelse or case_when, but I don’t know how to phrase the if or LHS statement to say “if 2 or more analoc_ columns are YES.”
You can sum the numbers rowwise() using c_across(). Note that sum(c_across(A:D) == "YES") first turns each column to a logical vector (“YES” to TRUE and “NO” to FALSE). Then, sum() automatically converts TRUE to 1 and FALSE to 0 before adding them up.
library(tidyverse)
set.seed(1)
dat <- data.frame(
A = sample(c("YES", "NO"), 10, replace = TRUE),
B = sample(c("YES", "NO"), 10, replace = TRUE),
C = sample(c("YES", "NO"), 10, replace = TRUE),
D = sample(c("YES", "NO"), 10, replace = TRUE)
)
dat |>
rowwise() |>
mutate(total = sum(c_across(A:D) == "YES"),
case = if_else(total > 1, 1, 2)) |>
ungroup()
# # A tibble: 10 × 6
# A B C D total case
# <chr> <chr> <chr> <chr> <int> <dbl>
# 1 YES YES YES NO 3 1
# 2 NO YES YES NO 2 1
# 3 YES YES YES YES 4 1
# 4 YES YES YES NO 3 1
# 5 NO YES YES YES 3 1
# 6 YES NO YES YES 3 1
# 7 YES NO NO NO 1 2
# 8 YES NO YES YES 3 1
# 9 NO NO YES NO 1 2
# 10 NO YES NO NO 1 2
Thank you for this @zawepi ! I noticed that when I call class() on the original YES/NO columns after running the code, it is still a character class. So the conversion to a logical vector happens only temporarily during c_across() operation?