Deduplicating one column based on values in a second column

neale · May 24, 2022, 10:19pm

Hi Kate,

Thanks for posting in the site! Here is one way to approach this, that I hope gets what you want:

pacman::p_load(tidyverse, janitor)

name <- c("Mike", "Billy", "Tom", "Emilie", "Geo", "Monique", "Emilie", "Mike","Tom")
home_phone <- c("831-458-8050", "654-493-7589", "512-203-0917","631-485-8157","419-301-7861","920-337-5912", "NA", "831-458-8050", "916-698-4843")
cell_phone <- c("NA","803-445-0053","NA", "626-831-6981", "405-364-7320", "530-457-2565", "631-485-8157", "831-738-2701","512-203-0917")
df <- data.frame(name, home_phone, cell_phone)

# names and home phones
home_phones <- df %>% 
     select(name, home_phone)

# names and cell phones
cell_phones <- df %>% 
     select(name, cell_phone)

# rows where name matches, and home phone matches cell phone
semi_join(home_phones, cell_phones, by = c("name", "home_phone" = "cell_phone"))
#>     name   home_phone
#> 1    Tom 512-203-0917
#> 2 Emilie 631-485-8157

^{Created on 2022-05-24 by the reprex package (v2.0.1)}

Also, a small tip for the future: If you put three backticks above and below your code in the post, it appear as a kind of code-text in the post (or highlight it and click the code icon next to the quote mark icon). As it was written in your post, I had to manually convert all the arrows and quote marks when I copied and pasted into R.

←
“Mike”

vs.

```
<-
"Mike"
```

Or use the {reprex} package as described here