R code for data management

lnielsen · January 10, 2024, 8:59pm

Hi @myekta, thanks for sharing your questions! Typically, we recommend providing a reproducible example with a sample of your dataset, but I believe I can assist you. For future questions, check the link How to Post an R Code Question.

Firstly, I’ve generated a “fake” dataset to illustrate potential solutions:

df <- data.frame(Disease = sample(c("Syphiliz", "Gonorrhea", "Influenza", "Salmonella", "Tuberculosis", "E. coli"), 20, replace = TRUE))

Concerning your initial question, I suggest utilizing the case_when function within a mutate. Consider the following example:

df2 <- df %>%
  mutate(Disease = case_when(
    Disease == "Syphiliz" ~ "Syphilis",
    Disease == "E. coli" ~ "Escherichia coli",
    TRUE ~ Disease
  ))

In this example, I am reassigning values to certain disease names in the ‘Disease’ column, and the TRUE ~ Disease ensures that everything else remains unchanged. The case_when function operates similarly to ifelse; it checks if a condition is TRUE or FALSE and assigns a new value accordingly.

Regarding your second question, I recommend using the ifelse function within a mutate. For instance:

df3 <- df2 %>%
  mutate(stds = ifelse(Disease %in% c("Gonorrhea", "Syphilis"), "yes", "no"),
         foodborne_diseases = ifelse(Disease %in% c("Salmonella", "Escherichia coli"), "yes", "no"))

You can repeat this process for other categories by creating new columns based on specific conditions. Thus, if certain values are found in Disease column, we fill the new column with “yes” otherwise, “no”.

I hope it helps. Let me know.

Lucca