I’m working with the COVID database from Module 10. While examining the data, I noticed that although there are no duplicate rows, there are duplicate IDC values. Looking at these duplicates, I see that some differ only in the city.
I need to apply the following logic:
If there are duplicate IDC values and they share the same CP (postal code), delete the record where the city is “Atlanta”.
If duplicate IDC values have different CP values, do not delete any records.
I’m providing a reproducible example below. Could you suggest an efficient approach in R/dplyr to handle this?
# reprex script
# installar y cargar paquetes
pacman::p_load(
rio,
here,
janitor,
reprex,
datapasta,
tidyverse)
# importar datos
vig_bruta <- data.frame(
IDC = c(NA, "a1", "b2","c3","d4", "b2","c3"),
ciudad = c(NA, "Atlanta", "East Point", "College Park", "Union City", "Atlanta", "Atlanta"),
cp = c(NA, "400", "401", "402", "403", "401", "402")
)
# limpiar el listado de vigilancia
vig <- vig_bruta %>%
clean_names()
Created on 2025-12-26 with reprex v2.1.1