I have a dataset for a case control study. and at this stage i want to create a table with the number of cases for cases and controls (with percentages), the OR, 95% CI and p-values for each exposure. my dataset has some NA values so i guess these should be left out for the analysis as i did in my code. I don’t know what function to use for the OR and also for continuity correction as in some variables i have zero observations so this does not allow OR to be calculated. Note that my data set is quite small. 50 controls and 25 cases
What steps have you already taken to find an answer?
i could only find the glm code in applied epi community, but at this stage i do not want a linear regression. I tried to find an OR code through chatgpt but it did not help.
Provide an example of your R code
define variables of interest
explanatory_vars ← c(“overnight_stay”, “shower_at_work”, “contact_with_water_under_pressure”,
“contact_with_central_ac”, “wash_car”, “fountain”,
“public_showers”, “dentist”,
“industrial_plant_cooling_tower”,
“ac_used_14_d”, “nebuliser”, “water_cuts”)
Step 2: Recode dichotomous variables to 0/1
quest ← quest %>%
mutate(across(
.cols = all_of(c(explanatory_vars, “class”)), # Target only relevant columns
.fns = ~ case_when(
. %in% c(“yes”, “case”) ~ 1, # Recode “yes” and “case” to 1
. %in% c(“no”, “control”) ~ 0, # Recode “no” and “control” to 0
TRUE ~ NA_real_ # Handle missing or invalid cases
)
))
drop rows with missing information for variables of interest
quest ← quest %>%
drop_na(any_of(c(“class”, explanatory_vars)))