Hi All,
I have a deduplication problem. For deduplication we are using a combination of name and phone number, however phone number can be in one of two columns: “home phone” or “cell phone”
Sometimes the “home phone” value ends up in the “cell phone” column and vice versa.
The data looks like this:
name ← c(“Mike”, “Billy”, “Tom”, “Emilie”, “Geo”, “Monique”, “Emilie”, “Mike”,“Tom”)
home_phone ← c(“831-458-8050”, “654-493-7589”, “512-203-0917”,“631-485-8157”,“419-301-7861”,“920-337-5912”, “NA”, “831-458-8050”, “916-698-4843”)
cell_phone ← c(“NA”,“803-445-0053”,“NA”,“626-831-6981”, “405-364-7320”, “530-457-2565”, “631-485-8157”, “831-738-2701”,“512-203-0917”)
df ← data.frame(name, home_phone, cell_phone)
In the example above Mike is clearly a duplicate as he has the same value for home phone twice, but Emilie and Tom are also duplicates but they don’t look like it because the home phone value is in the cell phone column.
Any help with this would be appreciated.