Applied Epi Community

Creating a new binary variable when 2 or more variables are true

I am working on a dataset on injuries. The dataset has 1 column each for the anatomic location of injury (analoc_eye, analoc_head, analoc_neck, analoc_chest, analoc_back, etc). The responses are coded either as YES or NO.

I’d like to create a new column (outcome_analoc_multiple) which codes as 1 if a case has 2 or more of the analoc_ columns with a YES, and 0 if there is only 1 analoc_ column with a YES. This column will eventually be used for logistic regression as an outcome (multiple anatomic locations of injuries).

I have thought about using ifelse or case_when, but I don’t know how to phrase the if or LHS statement to say “if 2 or more analoc_ columns are YES.”

Any thoughts? Thank you!

You can sum the numbers rowwise() using c_across(). Note that sum(c_across(A:D) == "YES") first turns each column to a logical vector (“YES” to TRUE and “NO” to FALSE). Then, sum() automatically converts TRUE to 1 and FALSE to 0 before adding them up.

library(tidyverse)

set.seed(1)
dat <- data.frame(
  A = sample(c("YES", "NO"), 10, replace = TRUE),
  B = sample(c("YES", "NO"), 10, replace = TRUE),
  C = sample(c("YES", "NO"), 10, replace = TRUE),
  D = sample(c("YES", "NO"), 10, replace = TRUE)
)

dat |> 
  rowwise() |> 
  mutate(total = sum(c_across(A:D) == "YES"),
         case = if_else(total > 1, 1, 2)) |> 
  ungroup()

# # A tibble: 10 × 6
#    A     B     C     D     total  case
#    <chr> <chr> <chr> <chr> <int> <dbl>
#  1 YES   YES   YES   NO        3     1
#  2 NO    YES   YES   NO        2     1
#  3 YES   YES   YES   YES       4     1
#  4 YES   YES   YES   NO        3     1
#  5 NO    YES   YES   YES       3     1
#  6 YES   NO    YES   YES       3     1
#  7 YES   NO    NO    NO        1     2
#  8 YES   NO    YES   YES       3     1
#  9 NO    NO    YES   NO        1     2
# 10 NO    YES   NO    NO        1     2
3 Likes

Thank you for this @zawepi ! I noticed that when I call class() on the original YES/NO columns after running the code, it is still a character class. So the conversion to a logical vector happens only temporarily during c_across() operation?

Yes, the conversion happens on-the-fly without changing the original columns.

1 Like

For more, also see: Stack Overflow: Count across columns if value is a certain character in R [duplicate] and other duplicates there.