Completeness of variables with grouped data

I am trying to create a table that details the completeness of each variable by ‘Code’ which is representative of a hospital trust.

The resulting table shows 100% for each code-variable combination, which is incorrect. I can’t find any errors in my code.

I have provided a sample below.

               janitor, #tabyl

Trust_2023 <- data.frame(
        stringsAsFactors = FALSE,
                    Code = c("RVJ","RVJ","RVJ",
  CCUnitBedConfiguration = c("2","5","5","5",
    CCDischargeReadyDate = c("04/07/2023",
       CCAdmissionSource = c("1","1","2","1",
         CCAdmissionType = c("4","1","2","1",

grouped_2023 <- Trust_2023 %>% 
  select(Code, CCUnitBedConfiguration, CCAdmissionSource, CCAdmissionType,
         CCDischargeReadyDate) %>% 
  group_by(Code) %>% 
  summarise(across(everything(), ~ sum(! / length(.) * 100)) 

Created on 2024-03-05 with reprex v2.0.2

Session info
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> Matrix products: default
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8 
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> time zone: Europe/London
#> tzcode source: internal
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> other attached packages:
#>  [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
#>  [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
#>  [9] ggplot2_3.4.4   tidyverse_2.0.0 datapasta_3.1.0 reprex_2.0.2   
#> [13] gtsummary_1.7.2 janitor_2.2.0   here_1.0.1      readxl_1.4.3   
#> [17] dbplyr_2.3.4    rio_1.0.1      
#> loaded via a namespace (and not attached):
#>  [1] gt_0.10.1            utf8_1.2.3           generics_0.1.3      
#>  [4] xml2_1.3.5           stringi_1.7.12       hms_1.1.3           
#>  [7] digest_0.6.33        magrittr_2.0.3       evaluate_0.22       
#> [10] grid_4.3.1           timechange_0.2.0     fastmap_1.1.1       
#> [13] cellranger_1.1.0     rprojroot_2.0.4      broom.helpers_1.14.0
#> [16] DBI_1.1.3            fansi_1.0.5          scales_1.2.1        
#> [19] cli_3.6.1            rlang_1.1.1          munsell_0.5.0       
#> [22] withr_2.5.1          yaml_2.3.7           tools_4.3.1         
#> [25] tzdb_0.4.0           colorspace_2.1-0     pacman_0.5.1        
#> [28] vctrs_0.6.4          R6_2.5.1             lifecycle_1.0.3     
#> [31] snakecase_0.11.1     fs_1.6.3             pkgconfig_2.0.3     
#> [34] gtable_0.3.4         pillar_1.9.0         glue_1.6.2          
#> [37] xfun_0.40            tidyselect_1.2.0     rstudioapi_0.15.0   
#> [40] knitr_1.44           htmltools_0.5.7      rmarkdown_2.25      
#> [43] compiler_4.3.1
1 Like

Hello @lia.beart,

I noticed that your empty values are coded as NULL, you might want to use is.null instead of to accurately check for completeness. Here’s an improved version of your summarise function:

summarise(across(everything(), ~ sum(!is.null(.)) / length(.) * 100))

Heres the output I received

A tibble: 7 × 5
  Code  CCUnitBedConfiguration CCAdmissionSource CCAdmissionType CCDischargeReadyDate
  <chr>                  <dbl>             <dbl>           <dbl>                <dbl>
1 RA7                    20                20              20                   20   
2 RJ1                    25                25              25                   25   
3 RL1                    50                50              50                   50   
4 RRF                   100               100             100                  100   
5 RTH                   100               100             100                  100   
6 RVJ                     4.17              4.17            4.17                 4.17
7 RWP                     7.69              7.69            7.69                 7.69
1 Like

Thats great thank you so much Lucca!