I have a dataset for a case control study. and at this stage i want to create a table with the number of cases for cases and controls (with percentages), the OR, 95% CI and p-values for each exposure. my dataset has some NA values so i guess these should be left out for the analysis as i did in my code. I don’t know what function to use for the OR and also for continuity correction as in some variables i have zero observations so this does not allow OR to be calculated. Note that my data set is quite small. 50 controls and 25 cases
i could only find the glm code in applied epi community, but at this stage i do not want a linear regression. I tried to find an OR code through chatgpt but it did not help.
define variables of interest
explanatory_vars ← c(“overnight_stay”, “shower_at_work”, “contact_with_water_under_pressure”,
“contact_with_central_ac”, “wash_car”, “fountain”,
“public_showers”, “dentist”,
“ac_used_14_d”, “nebuliser”, “water_cuts”)
Step 2: Recode dichotomous variables to 0/1
quest ← quest %>%
.cols = all_of(c(explanatory_vars, “class”)), # Target only relevant columns
.fns = ~ case_when(
. %in% c(“yes”, “case”) ~ 1, # Recode “yes” and “case” to 1
. %in% c(“no”, “control”) ~ 0, # Recode “no” and “control” to 0
TRUE ~ NA_real_ # Handle missing or invalid cases
drop rows with missing information for variables of interest
quest ← quest %>%
drop_na(any_of(c(“class”, explanatory_vars)))