Completeness of variables with grouped data

I am trying to create a table that details the completeness of each variable by ‘Code’ which is representative of a hospital trust.

The resulting table shows 100% for each code-variable combination, which is incorrect. I can’t find any errors in my code.

I have provided a sample below.

pacman::p_load(rio,
               dbplyr,
               readxl,
               here,
               janitor, #tabyl
               gtsummary,
               reprex,
               datapasta,
               tidyverse)

Trust_2023 <- data.frame(
        stringsAsFactors = FALSE,
                    Code = c("RVJ","RVJ","RVJ",
                             "RVJ","RVJ","RVJ","RVJ","RVJ","RVJ","RVJ",
                             "RVJ","RVJ","RVJ","RVJ","RVJ","RVJ","RVJ",
                             "RVJ","RVJ","RVJ","RVJ","RVJ","RVJ","RVJ",
                             "RTH","RA7","RA7","RA7","RA7","RA7","RRF","RL1",
                             "RL1","RWP","RWP","RWP","RWP","RWP","RWP",
                             "RWP","RWP","RWP","RWP","RWP","RWP","RWP",
                             "RJ1","RJ1","RJ1","RJ1"),
  CCUnitBedConfiguration = c("2","5","5","5",
                             "5","5","5","5","5","5","5","5","5","5",
                             "5","5","5","5","5","5","5","5","2","5",
                             "NULL","5","5","5","NULL","5","NULL","2","2",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","5","5"),
    CCDischargeReadyDate = c("04/07/2023",
                             "15/07/2023","14/07/2023","30/06/2023","08/07/2023",
                             "18/07/2023","28/07/2023","25/06/2023",
                             "22/07/2023","19/06/2023","08/07/2023","14/07/2023",
                             "29/06/2023","04/07/2023","18/07/2023","28/06/2023",
                             "17/07/2023","17/07/2023","27/06/2023",
                             "20/07/2023","07/07/2023","18/07/2023","31/07/2023",
                             "21/07/2023","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","19/07/2023","06/07/2023","NULL",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","01/06/2023","02/06/2023"),
       CCAdmissionSource = c("1","1","2","1",
                             "1","1","1","1","1","2","1","1","1","2",
                             "1","2","1","1","1","1","1","1","1","1",
                             "NULL","1","1","1","NULL","1","NULL","1","1",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","1","2"),
         CCAdmissionType = c("4","1","2","1",
                             "4","1","1","1","1","2","1","1","1","2",
                             "4","2","2","1","4","4","4","1","4","4",
                             "NULL","1","1","1","NULL","1","NULL","4","1",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","NULL","NULL","NULL",
                             "NULL","NULL","NULL","1","3")
)



grouped_2023 <- Trust_2023 %>% 
  select(Code, CCUnitBedConfiguration, CCAdmissionSource, CCAdmissionType,
         CCDischargeReadyDate) %>% 
  group_by(Code) %>% 
  summarise(across(everything(), ~ sum(!is.na(.)) / length(.) * 100)) 

Created on 2024-03-05 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8 
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> time zone: Europe/London
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
#>  [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
#>  [9] ggplot2_3.4.4   tidyverse_2.0.0 datapasta_3.1.0 reprex_2.0.2   
#> [13] gtsummary_1.7.2 janitor_2.2.0   here_1.0.1      readxl_1.4.3   
#> [17] dbplyr_2.3.4    rio_1.0.1      
#> 
#> loaded via a namespace (and not attached):
#>  [1] gt_0.10.1            utf8_1.2.3           generics_0.1.3      
#>  [4] xml2_1.3.5           stringi_1.7.12       hms_1.1.3           
#>  [7] digest_0.6.33        magrittr_2.0.3       evaluate_0.22       
#> [10] grid_4.3.1           timechange_0.2.0     fastmap_1.1.1       
#> [13] cellranger_1.1.0     rprojroot_2.0.4      broom.helpers_1.14.0
#> [16] DBI_1.1.3            fansi_1.0.5          scales_1.2.1        
#> [19] cli_3.6.1            rlang_1.1.1          munsell_0.5.0       
#> [22] withr_2.5.1          yaml_2.3.7           tools_4.3.1         
#> [25] tzdb_0.4.0           colorspace_2.1-0     pacman_0.5.1        
#> [28] vctrs_0.6.4          R6_2.5.1             lifecycle_1.0.3     
#> [31] snakecase_0.11.1     fs_1.6.3             pkgconfig_2.0.3     
#> [34] gtable_0.3.4         pillar_1.9.0         glue_1.6.2          
#> [37] xfun_0.40            tidyselect_1.2.0     rstudioapi_0.15.0   
#> [40] knitr_1.44           htmltools_0.5.7      rmarkdown_2.25      
#> [43] compiler_4.3.1
1 Like

Hello @lia.beart,

I noticed that your empty values are coded as NULL, you might want to use is.null instead of is.na to accurately check for completeness. Here’s an improved version of your summarise function:

summarise(across(everything(), ~ sum(!is.null(.)) / length(.) * 100))

Heres the output I received

A tibble: 7 × 5
  Code  CCUnitBedConfiguration CCAdmissionSource CCAdmissionType CCDischargeReadyDate
  <chr>                  <dbl>             <dbl>           <dbl>                <dbl>
1 RA7                    20                20              20                   20   
2 RJ1                    25                25              25                   25   
3 RL1                    50                50              50                   50   
4 RRF                   100               100             100                  100   
5 RTH                   100               100             100                  100   
6 RVJ                     4.17              4.17            4.17                 4.17
7 RWP                     7.69              7.69            7.69                 7.69
1 Like

Thats great thank you so much Lucca!

2 Likes