Hi @myekta, thanks for sharing your questions! Typically, we recommend providing a reproducible example with a sample of your dataset, but I believe I can assist you. For future questions, check the link How to Post an R Code Question.
Firstly, I’ve generated a “fake” dataset to illustrate potential solutions:
df <- data.frame(Disease = sample(c("Syphiliz", "Gonorrhea", "Influenza", "Salmonella", "Tuberculosis", "E. coli"), 20, replace = TRUE))
Concerning your initial question, I suggest utilizing the case_when
function within a mutate
. Consider the following example:
df2 <- df %>%
mutate(Disease = case_when(
Disease == "Syphiliz" ~ "Syphilis",
Disease == "E. coli" ~ "Escherichia coli",
TRUE ~ Disease
))
In this example, I am reassigning values to certain disease names in the ‘Disease’ column, and the TRUE ~ Disease
ensures that everything else remains unchanged. The case_when
function operates similarly to ifelse
; it checks if a condition is TRUE
or FALSE
and assigns a new value accordingly.
Regarding your second question, I recommend using the ifelse
function within a mutate
. For instance:
df3 <- df2 %>%
mutate(stds = ifelse(Disease %in% c("Gonorrhea", "Syphilis"), "yes", "no"),
foodborne_diseases = ifelse(Disease %in% c("Salmonella", "Escherichia coli"), "yes", "no"))
You can repeat this process for other categories by creating new columns based on specific conditions. Thus, if certain values are found in Disease column, we fill the new column with “yes” otherwise, “no”.
I hope it helps. Let me know.
Lucca