Adding columns for count (percent) of controls and cases to tbl_uvregression() output

iancgmd · July 27, 2022, 5:39am

Hello! I’m following along the epiR handbook on regression. I’m at the gtsummary section and I was thinking if it was possible for the output to have 2 columns rather than just the total N column: a column for count (percent) of controls and another column for cases. This is similar to the output in the previous section (with columns for 1 and 0).

I tried piping it to the the group_by(outcome) as follows, but I get the following error: Error in UseMethod(“group_by”) : no applicable method for ‘group_by’ applied to an object of class “c(‘tbl_uvregression’, ‘gtsummary’)”

univ_tab ← linelist %>%
dplyr::select(explanatory_vars, outcome) %>% ## select variables of interest

tbl_uvregression( ## produce univariate table
method = glm, ## define regression want to run (generalised linear model)
y = outcome, ## define outcome variable
method.args = list(family = binomial), ## define what type of glm want to run (logistic)
exponentiate = TRUE ## exponentiate to produce odds ratios (rather than log odds)
) %>%
group_by(outcome)

aspina · July 27, 2022, 9:38am

Hi @iancgmd - is this what you are looking for?

output <- linelist %>% 
  select(died_covid, gender, age_group) %>%    # keep variables of interest
  tbl_uvregression(                         ## produce univariate table
    method = glm,                           ## define regression want to run (generalised linear model)
    y = died_covid,                            ## define outcome variable
    method.args = list(family = binomial),  ## define what type of glm want to run (logistic)
    exponentiate = TRUE,                    ## exponentiate to produce odds ratios (rather than log odds)
    hide_n = TRUE                               ## dont include overall counts in regression table
  ) 

## produce counts for each of the variables of interest
cross_tab <- output$inputs$data %>% 
  tbl_summary(by = died_covid)

## combine for a full table 
tbl_merge(list(cross_tab, output))

iancgmd · July 28, 2022, 11:58am

Thanks, Alex! That works perfectly. I’m trying to understand the code. Could you explain what outputs$inputs$data does in the creation of cross_tab?

amy.mikhail · August 12, 2022, 3:06pm

If you type ?gtsummary::tbl_uvregression, you can see on the help page that the value produced by running this function is described as an object, not a data frame. If you inspect your output object, you will see that it is actually a list of objects, and some of those objects have more elements nested within them. If a list contains a data.frame, you can access that layer by typing the dollar sign after the name of the list.

So:

output$inputs$data

is accessing the object you just created with gtsummary::tbl_uvregression(), and inside that object there are multiple elements. One of those elements is called inputs and this contains all the information that you fed into the function as arguments. One of the inputs were the columns to use in the regression, which are stored within inputs as a data.frame called data.

If you try typing output$inputs$data in your console, you will be able to see that data.frame, which should contain your three input columns, died_covid, gender and age_group. The function tbl_summary is summarising that data by creating counts according to the died_covid column.

The last line of Alex’s solution merges these summary counts with the table created by tbl_uvregression() for the final output.

Tip:
If you want to see what elements a list object contains, type the name of the object, followed by the dollar sign and have a look at what comes up in the RStudio prompt. It will show you the list of elements contained within that object, which you can scroll through and select for further manipulations or inspection.

iancgmd · August 13, 2022, 3:34am

Thank you for the detailed explanation @amy.mikhail ! I really appreciate it!