Getting xtfrm.data.frame(x) error when using for loops

I’m trying to replicate the example in the epiR documentation on Measures of association for a series of candidate risk factors. It works when I run the example as is, but when I use my own dataset, I get Error in xtfrm.data.frame(x) : (converted from warning) cannot xtfrm data frames

Here is my script:

Load my dataset rcase_socdemog to a new dataset dat.df04

dat.df04 ← rcase_socdemog %>%
select(rcase, sex, age_under5, age_under10)

head(dat.df04)

Empty vectors to collect results:

rfactor ← ref ← or.p ← or.l ← or.u ← c()

The candidate risk factors are in columns 2 to 4 of data frame dat.df04:

for(i in 2:4){
tdat.tab04 ← table(dat.df04[,i], dat.df04$rcase)
tdat.epi04 ← epi.2by2(dat = tdat.tab04, method = “cohort.count”,
conf.level = 0.95, units = 100, interpret = FALSE, outcome = “as.columns”)

trfactor ← as.character(names(dat.df04)[i])
rfactor ← c(rfactor, trfactor)

tref ← as.character(paste("Reference: ", trfactor, " - ", levels(dat.df04[,i])[2], sep = “”))
ref ← c(ref, tref)

tor.p ← as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[1])
or.p ← c(or.p, tor.p)

tor.l ← as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[2])
or.l ← c(or.l, tor.l)

tor.u ← as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[3])
or.u ← c(or.u, tor.u)
}

Any help is appreciated. Thank you!

I cannot reproduce your error. I tested your code using a fake dataset. It seems to work fine. Could you describe more about your dataset? What are the data types of your variables?

library(epiR)
library(dplyr)

make_data <- function() {
  n <- 100
  A <- rbinom(n, 1, 0.3)
  B <- rbinom(n, 1, 0.2)
  X <- rbinom(n, 1, 0.1 + 0.4*A)
  rcase <- rbinom(n, 1, 0.1 + 0.2*X + 0.1*A + 0.2*B)
  
  dat <- data.frame(rcase, A, B, X)
  
  return(dat)
}

dat.df04 <- make_data() |> 
  mutate(across(A:X, factor, labels = c("No", "Yes")),
         rcase = factor(rcase, labels = c("Non-case", "Case")))

head(dat.df04)

#      rcase   A   B   X
# 1 Non-case  No  No  No
# 2 Non-case Yes  No  No
# 3 Non-case  No  No  No
# 4 Non-case  No Yes  No
# 5 Non-case  No Yes Yes
# 6 Non-case  No  No  No

# Empty vectors to collect results:
rfactor <- ref <- or.p <- or.l <- or.u <- c()

# The candidate risk factors are in columns 2 to 4 of data frame dat.df04:
for(i in 2:4){
  tdat.tab04 <- table(dat.df04[,i], dat.df04$rcase)
  tdat.epi04 <- epi.2by2(dat = tdat.tab04, method = "cohort.count",
                        conf.level = 0.95, units = 100, interpret = FALSE, outcome = "as.columns")
  
  trfactor <- as.character(names(dat.df04)[i])
  rfactor <- c(rfactor, trfactor)
  
  tref <- as.character(paste("Reference: ", trfactor, " - ", levels(dat.df04[,i])[2], sep = ""))
  ref <- c(ref, tref)
  
  tor.p <- as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[1])
  or.p <- c(or.p, tor.p)
  
  tor.l <- as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[2])
  or.l <- c(or.l, tor.l)
  
  tor.u <- as.numeric(tdat.epi04$massoc.detail$OR.strata.wald[3])
  or.u <- c(or.u, tor.u)
}

data.frame(rfactor, ref, or.p, or.l, or.u)

#   rfactor                ref     or.p     or.l      or.u
# 1       A Reference: A - Yes 1.254438 0.462428  3.402939
# 2       B Reference: B - Yes 8.068182 2.616992 24.874193
# 3       X Reference: X - Yes 5.619835 1.932903 16.339435
1 Like

Thanks for responding! I converted the binary variables of my dataset into factors as instructed in the earlier part of the epiR documentation. Here’s my dataset in CSV format. I can’t seem to figure out what’s wrong.

I got it to work too! I cleared the workspace and restarted R and it just worked. :man_shrugging:

In any case, thank you for your response @zawepi!

Now that I got it to work, I want to use it on a bigger dataset with more exposure variables. If I were to do that, is there anything I should change aside from the “for(i in 2:4)”?

My approach would be to write a function and cycle through a character vector of variable names using purrr::map_dfr(). With this approach, you can use the dataset directly, without selecting variables.

If you only want OR and CI, I would suggest using broom::tidy() which supports epi.2by2() objects. It provides the estimates and CIs in a data.frame.

I try to get the ref level depending on whether the variable is a factor or a character.

Just my two cents. Hope this helps.

library(epiR)
library(tidyverse)
library(broom)

dat.df04 <- read.csv("rcase_socdemog.csv")

# List risk factor variables

rf_vars <- c(
  "sex",
  "age_under5",
  "age_under10"
)

# Name the variables by itself. This is necessary for `.id = "rf_var"` in the last line to work properly.
names(rf_vars) <- rf_vars

# Function to get ORs

get_or <- function(var, data) {
  v <- data[[var]]
  tab <- table(v, data[["rcase"]])
  
  if (is.factor(v)) {
    ref_lvl <- levels(v)[1]
  } else if (is.character(v)) {
    ref_lvl <- unique(v)[1]
  } else {
    ref_lvl <- ""
  }
  
  or <- epi.2by2(tab, method = "cohort.count", conf.level = 0.95,
                 units = 100, interpret = FALSE, outcome = "as.columns") %>%
    broom::tidy() %>%
    filter(term == "OR.strata.wald") %>%
    mutate(ref = ref_lvl) %>%
    relocate(ref, .after = term)
  
  return(or)
}

map_dfr(rf_vars, get_or, data = dat.df04, .id = "rf_var")
1 Like

Wow this is awesome and works flawlessly! If I would like to add 2 additional columns for uncorrected chi square (2-tailed P value) and fisher exact (2-tailed P value), what additional codes would I incorporate?

broom::tidy() has a parameter argument that specifies the output: measure of association or stats. The default is the former. Use the latter to get the p values. Please see the function documentation for more details.

To get the p value, we could change get_or() to get moa and stats separately and then combine them.

get_or <- function(var, data) {
  v <- data[[var]]
  tab <- table(v, data[["rcase"]])
  
  if (is.factor(v)) {
    ref_lvl <- levels(v)[1]
  } else if (is.character(v)) {
    ref_lvl <- unique(v)[1]
  } else {
    ref_lvl <- ""
  }
  
  two_by_two <- 
    epi.2by2(tab,method = "cohort.count", conf.level = 0.95,
             units = 100, interpret = FALSE, outcome = "as.columns")
  
  moa <- broom::tidy(two_by_two, parameters = "moa") %>%
    filter(term == "OR.strata.wald") %>%
    mutate(ref = ref_lvl) %>%
    relocate(ref, .after = term)
  
  p <- broom::tidy(two_by_two, parameters = "stat") %>%
    filter(term == "chi2.strata.uncor" | term == "chi2.strata.fisher") %>%
    select(term, p.value) %>%
    mutate(term = paste0(term, ".p")) %>%
    pivot_wider(everything(), names_from = "term", values_from = p.value)
  
  res <- bind_cols(moa, p)
  
  return(res)
}
1 Like

Thank you so much! I’m still wrapping my head around all the different functions. The default documentation in R is highly technical too. I’m so glad there are people willing to help. Cheers!

Ah! That’s right. Getting to the documentation is not straightfoward. ?broom::tidy will only get to generic page of the function. You need to add epi.2by2 following a dot to get to the relevant documentation. Try ?broom::tidy.epi.2by2 in the console. Glad that it’s helpful!

2 Likes

Hello, I face the same issue and I cant seem to get out of it.
Please help.

here is my code.

bike_df ← read_xlsx(“1657875746_day.xlsx”)
head(bike_df,5)

attach(bike_df)

#Create new dataset excluding casual and registered variables
bike_df<-subset(bike_df,select=-c(casual,registered))
head(bike_df,5)

#Dimension of dataset
dim(bike_df)

#Summary of the dataset
summary(bike_df)

#Structure of dataset
str(bike_df)

#Rename the columns
names(bike_df)<-c(‘rec_id’,‘datetime’,‘season’,‘year’,‘month’,‘holiday’,‘weekday’,‘workingday’,‘weather_condition’,‘temp’,‘atemp’,‘humidity’,‘windspeed’,‘total_count’)

#Read the data
head(bike_df,5)

#Typecasting the datetime and numerical attributes to category

bike_df$datetime<- as.Date(bike_df$datetime)
bike_df$year<-as.factor(bike_df$year)
bike_df$month<-as.factor(bike_df$month)
bike_df$season ← as.factor(bike_df$season)
bike_df$holiday<- as.factor(bike_df$holiday)
bike_df$weekday<- as.factor(bike_df$weekday)
bike_df$workingday<- as.factor(bike_df$workingday)
bike_df$weather_condition<- as.factor(bike_df$weather_condition)

#Missing values in dataset
missing_val<-data.frame(apply(bike_df,2,function(x){sum(is.na(x))}))
names(missing_val)[1]=‘missing_val’
missing_val

#column plot for season wise monthly distribution of counts
ggplot(bike_df,aes(x=month,y=total_count,fill=season))+theme_bw()+geom_col()+
labs(x=‘Month’,y=‘Total_Count’,title=‘Season wise monthly distribution of counts’)

#column plot for weekday wise monthly distribution of counts
ggplot(bike_df,aes(x=month,y=total_count,fill=weekday))+theme_bw()+geom_col()+
labs(x=‘Month’,y=‘Total_Count’,title=‘Weekday wise monthly distribution of counts’)

#Violin plot for Yearly wise distribution of counts
ggplot(bike_df,aes(x=year,y=total_count,fill=year))+geom_violin()+theme_bw()+
labs(x=‘Year’,y=‘Total_Count’,title=‘Yearly wise distribution of counts’)

#Column plot for holiday wise distribution of counts
ggplot(bike_df,aes(x=holiday,y=total_count,fill=season))+geom_col()+theme_bw()+
labs(x=‘holiday’,y=‘Total_Count’,title=‘Holiday wise distribution of counts’)

#Column plot for workingday wise distribution of counts
ggplot(bike_df,aes(x=workingday,y=total_count,fill=season))+geom_col()+theme_bw()+
labs(x=‘workingday’,y=‘Total_Count’,title=‘Workingday wise distribution of counts’)

#Column plot for weather_condition distribution of counts
ggplot(bike_df,aes(x=weather_condition,y=total_count,fill=season))+geom_col()+theme_bw()+
labs(x=‘Weather_condition’,y=‘total_count’,title=‘Weather_condition distribution of counts’)

#boxplot for total_count_outliers
par(mfrow=c(1, 1))#divide graph area in 1 columns and 1 rows
boxplot(bike_df$total_count,main=‘Total_count’,sub=paste(boxplot.stats(bike_df$total_count)$out))

#box plots for outliers
par(mfrow=c(2,2))
#Box plot for temp outliers
boxplot(bike_df$temp, main=“Temp”,sub=paste(boxplot.stats(bike_df$temp)$out))
#Box plot for humidity outliers
boxplot(bike_df$humidity,main=“Humidity”,sub=paste(boxplot.stats(bike_df$humidity)$out))
#Box plot for windspeed outliers
boxplot(bike_df$windspeed,main=“Windspeed”,sub=paste(boxplot.stats(bike_df$windspeed)$out))

#load the DMwR library
library(DMwR2)
#create subset for windspeed and humidity variable
wind_hum<-subset(bike_df,select=c(‘windspeed’,‘humidity’))

#column names of wind_hum
cnames<-colnames(wind_hum)
for(i in cnames){
val=wind_hum[,i][wind_hum[,i] %in% boxplot.stats(wind_hum[,i])$out] #outlier values
wind_hum[,i][wind_hum[,i] %in% val]= NA # Replace outliers with NA
}

this last code doesnt seem to work.
please suggest how to get through it.

Hi @ricktikra , could you provide a sample of your dataset so we can try your code out? There are instructions on how to use the datapasta package to do this here.

Best,
Ian