Applying na_if() across selected columns

Thank you for posting! Here is an outline of an effective post:

Describe your issue

Hello! I’d like to apply na_if() across multiple specific columns in one go.

What steps have you already taken to find an answer?

Presently, I do sequential mutate steps on a “per variable” basis. However, I was wondering if there was a more elegant solution?

Provide an example of your R code

library(tidyverse)

linelist_raw <- data.frame(
  stringsAsFactors = FALSE,
  sex = c("m", "m", "m", "m", "m", "m", "m", "m", "f", "m"),
  hx_vax = c("1", "0", "0", "1", "uk", "uk", "1", "uk", "0", "0"),
  pcr = c("1", "nd", "0", "1", "nd", "0", "1", "uk", "0", "1"))

linelist <- linelist_raw %>% 
  mutate(hx_vax = na_if(hx_vax, "uk"),
         pcr = na_if(pcr, "uk"),
         pcr = na_if(pcr, "nd"))

In this example dataset, hx_vax has 3 options: 1 = yes, 0 = no, and uk = unknown. pcr has 4 options: 1 = positive, 0 = negative, nd = not done, uk = tested but unknown result.

In the na_if() help file, an option would be to do the following:

linelist <- linelist_raw %>%
    mutate(across(where(is.character), ~na_if(., "uk")))

However in the full dataset there are many other character columns that I don’t want to mutate at the moment so I would rather specify which columns.

1 Like

Hi @iancgmd

Instead of using where(is.character) to indicate the columns to mutate across, try using other tidyselect helper functions such as any_of(), all_of(), contains(), starts_with() etc.

1 Like

Thanks for responding, @neale ! I tried doing the following:

ukvars <- c("hx_vax", "pcr")

linelist <- linelist_raw %>%
    mutate(across(where(all_of(ukvars)), ~na_if(., "uk")))

However, I get the following error:

Error in `mutate()`:
ℹ In argument: `across(where(all_of(ukvars)), ~na_if(., "uk"))`.
Caused by error in `across()`:
! Problem while evaluating `where(all_of(ukvars))`.
Caused by error in `where()`:
! Can't convert `fn`, an integer vector, to a function.
Run `rlang::last_trace()` to see where the error occurred.
1 Like

Hi Ian,

Try something like this:

linelist <- linelist_raw %>%
    mutate(across(c(hx_vax, pcr), ~na_if(., "uk")))

Of course, you could use other tidyselect functions inside the across function if needed.

All the best,

Tim

Thanks @machupovirus ! Concatenating the column names worked.

Regards,
Ian

1 Like

hey @iancgmd this would have worked all you had to do was remove the where() i think

1 Like

Hi @aspina ! Thanks! That worked too! I’m still unsure when exactly to use where().

Solution:

df <- df %>%
    mutate(across(all_of(ukvars), ~na_if(., "uk")))
2 Likes

where() is mostly for base R functions - so its useful if you are using square brackets to subset dataframes by rows or columns (for all the tidyselect stuff you mostly dont need to use it).

1 Like

Thank you for clarifying, Alex!

1 Like