Applied Epi Community

Getting xtfrm.data.frame(x) error when using for loops

I’m trying to replicate the example in the epiR documentation on Measures of association for a series of candidate risk factors. It works when I run the example as is, but when I use my own dataset, I get Error in xtfrm.data.frame(x) : (converted from warning) cannot xtfrm data frames

Here is my script:

Load my dataset rcase_socdemog to a new dataset dat.df04

dat.df04 ← rcase_socdemog %>%
select(rcase, sex, age_under5, age_under10)

head(dat.df04)

Empty vectors to collect results:

rfactor ← ref ← or.p ← or.l ← or.u ← c()

The candidate risk factors are in columns 2 to 4 of data frame dat.df04:

for(i in 2:4){
tdat.tab04 ← table(dat.df04[,i], dat.df04$rcase)
tdat.epi04 ← epi.2by2(dat = tdat.tab04, method = “cohort.count”,
conf.level = 0.95, units = 100, interpret = FALSE, outcome = “as.columns”)

trfactor ← as.character(names(dat.df04)[i])
rfactor ← c(rfactor, trfactor)

tref ← as.character(paste("Reference: ", trfactor, " - ", levels(dat.df04[,i])[2], sep = “”))
ref ← c(ref, tref)

tor.p ← as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[1])
or.p ← c(or.p, tor.p)

tor.l ← as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[2])
or.l ← c(or.l, tor.l)

tor.u ← as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[3])
or.u ← c(or.u, tor.u)
}

Any help is appreciated. Thank you!

I cannot reproduce your error. I tested your code using a fake dataset. It seems to work fine. Could you describe more about your dataset? What are the data types of your variables?

library(epiR)
library(dplyr)

make_data <- function() {
  n <- 100
  A <- rbinom(n, 1, 0.3)
  B <- rbinom(n, 1, 0.2)
  X <- rbinom(n, 1, 0.1 + 0.4*A)
  rcase <- rbinom(n, 1, 0.1 + 0.2*X + 0.1*A + 0.2*B)
  
  dat <- data.frame(rcase, A, B, X)
  
  return(dat)
}

dat.df04 <- make_data() |> 
  mutate(across(A:X, factor, labels = c("No", "Yes")),
         rcase = factor(rcase, labels = c("Non-case", "Case")))

head(dat.df04)

#      rcase   A   B   X
# 1 Non-case  No  No  No
# 2 Non-case Yes  No  No
# 3 Non-case  No  No  No
# 4 Non-case  No Yes  No
# 5 Non-case  No Yes Yes
# 6 Non-case  No  No  No

# Empty vectors to collect results:
rfactor <- ref <- or.p <- or.l <- or.u <- c()

# The candidate risk factors are in columns 2 to 4 of data frame dat.df04:
for(i in 2:4){
  tdat.tab04 <- table(dat.df04[,i], dat.df04$rcase)
  tdat.epi04 <- epi.2by2(dat = tdat.tab04, method = "cohort.count",
                        conf.level = 0.95, units = 100, interpret = FALSE, outcome = "as.columns")
  
  trfactor <- as.character(names(dat.df04)[i])
  rfactor <- c(rfactor, trfactor)
  
  tref <- as.character(paste("Reference: ", trfactor, " - ", levels(dat.df04[,i])[2], sep = ""))
  ref <- c(ref, tref)
  
  tor.p <- as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[1])
  or.p <- c(or.p, tor.p)
  
  tor.l <- as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[2])
  or.l <- c(or.l, tor.l)
  
  tor.u <- as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[3])
  or.u <- c(or.u, tor.u)
}

data.frame(rfactor, ref, or.p, or.l, or.u)

#   rfactor                ref     or.p     or.l      or.u
# 1       A Reference: A - Yes 1.254438 0.462428  3.402939
# 2       B Reference: B - Yes 8.068182 2.616992 24.874193
# 3       X Reference: X - Yes 5.619835 1.932903 16.339435
1 Like

Thanks for responding! I converted the binary variables of my dataset into factors as instructed in the earlier part of the epiR documentation. Here’s my dataset in CSV format. I can’t seem to figure out what’s wrong.

I got it to work too! I cleared the workspace and restarted R and it just worked. :man_shrugging:

In any case, thank you for your response @zawepi!

Now that I got it to work, I want to use it on a bigger dataset with more exposure variables. If I were to do that, is there anything I should change aside from the “for(i in 2:4)”?

My approach would be to write a function and cycle through a character vector of variable names using purrr::map_dfr(). With this approach, you can use the dataset directly, without selecting variables.

If you only want OR and CI, I would suggest using broom::tidy() which supports epi.2by2() objects. It provides the estimates and CIs in a data.frame.

I try to get the ref level depending on whether the variable is a factor or a character.

Just my two cents. Hope this helps.

library(epiR)
library(tidyverse)
library(broom)

dat.df04 <- read.csv("rcase_socdemog.csv")

# List risk factor variables

rf_vars <- c(
  "sex",
  "age_under5",
  "age_under10"
)

# Name the variables by itself. This is necessary for `.id = "rf_var"` in the last line to work properly.
names(rf_vars) <- rf_vars

# Function to get ORs

get_or <- function(var, data) {
  v <- data[[var]]
  tab <- table(v, data[["rcase"]])
  
  if (is.factor(v)) {
    ref_lvl <- levels(v)[1]
  } else if (is.character(v)) {
    ref_lvl <- unique(v)[1]
  } else {
    ref_lvl <- ""
  }
  
  or <- epi.2by2(tab, method = "cohort.count", conf.level = 0.95,
                 units = 100, interpret = FALSE, outcome = "as.columns") %>%
    broom::tidy() %>%
    filter(term == "OR.strata.wald") %>%
    mutate(ref = ref_lvl) %>%
    relocate(ref, .after = term)
  
  return(or)
}

map_dfr(rf_vars, get_or, data = dat.df04, .id = "rf_var")
1 Like

Wow this is awesome and works flawlessly! If I would like to add 2 additional columns for uncorrected chi square (2-tailed P value) and fisher exact (2-tailed P value), what additional codes would I incorporate?

broom::tidy() has a parameter argument that specifies the output: measure of association or stats. The default is the former. Use the latter to get the p values. Please see the function documentation for more details.

To get the p value, we could change get_or() to get moa and stats separately and then combine them.

get_or <- function(var, data) {
  v <- data[[var]]
  tab <- table(v, data[["rcase"]])
  
  if (is.factor(v)) {
    ref_lvl <- levels(v)[1]
  } else if (is.character(v)) {
    ref_lvl <- unique(v)[1]
  } else {
    ref_lvl <- ""
  }
  
  two_by_two <- 
    epi.2by2(tab,method = "cohort.count", conf.level = 0.95,
             units = 100, interpret = FALSE, outcome = "as.columns")
  
  moa <- broom::tidy(two_by_two, parameters = "moa") %>%
    filter(term == "OR.strata.wald") %>%
    mutate(ref = ref_lvl) %>%
    relocate(ref, .after = term)
  
  p <- broom::tidy(two_by_two, parameters = "stat") %>%
    filter(term == "chi2.strata.uncor" | term == "chi2.strata.fisher") %>%
    select(term, p.value) %>%
    mutate(term = paste0(term, ".p")) %>%
    pivot_wider(everything(), names_from = "term", values_from = p.value)
  
  res <- bind_cols(moa, p)
  
  return(res)
}
1 Like

Thank you so much! I’m still wrapping my head around all the different functions. The default documentation in R is highly technical too. I’m so glad there are people willing to help. Cheers!

Ah! That’s right. Getting to the documentation is not straightfoward. ?broom::tidy will only get to generic page of the function. You need to add epi.2by2 following a dot to get to the relevant documentation. Try ?broom::tidy.epi.2by2 in the console. Glad that it’s helpful!

2 Likes