Error message while creating Alluvial/Sankey Diagrams

Hello Applied Epi community colleagues,
I need bit of your help look at my script please. I am trying create an Alluvial/Sankey Diagrams from our handbook here 35 Diagrams and charts | The Epidemiologist R Handbook.

Although i feel data is properly available, When i run the first and second chunk an error would say as follows:

Error in rename(., source = age_group, target = HF_Visited) :
unused arguments (source = age_group, target = HF_Visited)

Please help me fix this! Thank you :slight_smile:

# Test code ---------------------------------------------------------------

pacman::p_load(rio, lubridate, datapasta, reprex, tidyverse) 

test_data <-  Paper2.2_Costing %>% 
  head(10) %>% 
  select(age_group , HF_Visited , table2_crs8)
#> Error in eval(expr, envir, enclos): object 'Paper2.2_Costing' not found

dpasta(test_data)
#> Error in eval(expr, envir, enclos): object 'test_data' not found


data.frame(
  stringsAsFactors = FALSE,
         age_group = c(2L, 1L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L),
                       HF_Visited = c("Regional Hospital","Regional Hospital",
                                      "Regional Hospital","Dispensary",
                                      "Regional Hospital","Dispensary",
                                      "Private Hospital","District Hospital",
                                      "Private Hospital","Private Hospital"),
       table2_crs8 = c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L)
               )
#>    age_group        HF_Visited table2_crs8
#> 1          2 Regional Hospital           1
#> 2          1 Regional Hospital           1
#> 3          3 Regional Hospital           1
#> 4          2        Dispensary           1
#> 5          2 Regional Hospital           0
#> 6          4        Dispensary           1
#> 7          2  Private Hospital           1
#> 8          3 District Hospital           1
#> 9          2  Private Hospital           1
#> 10         1  Private Hospital           1

pacman::p_load(
  DiagrammeR,     # for flow diagrams
  networkD3,      # For alluvial/Sankey diagrams
  tidyverse)      # data management and visualization



# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)
#> Error in eval(expr, envir, enclos): object 'test_data' not found



hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)
#> Error in eval(expr, envir, enclos): object 'test_data' not found


# combine links
links <- bind_rows(age_hosp_links, hosp_out_links) 
#> Error in eval(expr, envir, enclos): object 'age_hosp_links' not found


# The unique node names
nodes <- data.frame(
  name=c(as.character(links$source), as.character(links$target)) %>% 
    unique()
)
#> Error in eval(expr, envir, enclos): object 'links' not found


# Create id numbers
links$IDsource <- match(links$source, nodes$name)-1 
#> Error in eval(expr, envir, enclos): object 'links' not found
links$IDtarget <- match(links$target, nodes$name)-1
#> Error in eval(expr, envir, enclos): object 'links' not found


# plot
######
p <- sankeyNetwork(Links = links,
                   Nodes = nodes,
                   Source = "IDsource",
                   Target = "IDtarget",
                   Value = "n",
                   NodeID = "name",
                   units = "TWh",
                   fontSize = 12,
                   nodeWidth = 30,
                   iterations = 0)
#> Error in eval(expr, envir, enclos): object 'links' not found
p
#> Error in eval(expr, envir, enclos): object 'p' not found

Created on 2023-10-18 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian BokmÄl_Norway.utf8 
#> [2] LC_CTYPE=Norwegian BokmÄl_Norway.utf8   
#> [3] LC_MONETARY=Norwegian BokmÄl_Norway.utf8
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian BokmÄl_Norway.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] networkD3_0.4     DiagrammeR_1.0.10 forcats_1.0.0     stringr_1.5.0    
#>  [5] dplyr_1.1.2       purrr_1.0.1       readr_2.1.4       tidyr_1.3.0      
#>  [9] tibble_3.2.1      ggplot2_3.4.2     tidyverse_2.0.0   reprex_2.0.2     
#> [13] datapasta_3.1.0   lubridate_1.9.2   rio_0.5.29       
#> 
#> loaded via a namespace (and not attached):
#>  [1] utf8_1.2.3         generics_0.1.3     stringi_1.7.12     hms_1.1.3         
#>  [5] digest_0.6.33      magrittr_2.0.3     RColorBrewer_1.1-3 evaluate_0.21     
#>  [9] grid_4.3.1         timechange_0.2.0   fastmap_1.1.1      jsonlite_1.8.7    
#> [13] cellranger_1.1.0   zip_2.3.0          fansi_1.0.4        scales_1.2.1      
#> [17] cli_3.6.1          rlang_1.1.1        visNetwork_2.1.2   munsell_0.5.0     
#> [21] withr_2.5.0        yaml_2.3.7         tools_4.3.1        tzdb_0.4.0        
#> [25] colorspace_2.1-0   pacman_0.5.1       curl_5.0.1         vctrs_0.6.3       
#> [29] R6_2.5.1           lifecycle_1.0.3    htmlwidgets_1.6.2  fs_1.6.3          
#> [33] foreign_0.8-84     pkgconfig_2.0.3    pillar_1.9.0       openxlsx_4.2.5.2  
#> [37] gtable_0.3.3       data.table_1.14.8  glue_1.6.2         Rcpp_1.0.11       
#> [41] haven_2.5.3        xfun_0.39          tidyselect_1.2.0   rstudioapi_0.15.0 
#> [45] knitr_1.43         igraph_1.5.0.1     htmltools_0.5.5    rmarkdown_2.23    
#> [49] compiler_4.3.1     readxl_1.4.3
1 Like

Hello,

The data you used at the very beginning of your code, Paper2.2_Costing, is not defined anywhere so nothing dependent on it will work. You need to define this data before proceeding any further.

All the best,

Tim

1 Like

Hi,
Please see if this data works.
When i run code by line, its at “count” step when code chunks defining “age_hosp_links” and “hosp_out_links” won’t work.
But not sure why that error does not copied in reprex.
Thank you!

test_data <- data.frame(
  stringsAsFactors = FALSE,
         age_group = c(2L, 1L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L),
                       HF_Visited = c("Regional Hospital","Regional Hospital",
                                      "Regional Hospital","Dispensary",
                                      "Regional Hospital","Dispensary",
                                      "Private Hospital","District Hospital",
                                      "Private Hospital","Private Hospital"),
       table2_crs8 = c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L)
               )

pacman::p_load(
  DiagrammeR,     # for flow diagrams
  networkD3,      # For alluvial/Sankey diagrams
  tidyverse)      # data management and visualization



# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)



hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)


# combine links
links <- bind_rows(age_hosp_links, hosp_out_links) 
#> Error in `bind_rows()`:
#> ! Can't combine `..1$target` <character> and `..2$target` <integer>.
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::bind_rows(age_hosp_links, hosp_out_links)
#>   2. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
#>   3. └─vctrs (local) `<fn>`()
#>   4.   └─vctrs::vec_default_ptype2(...)
#>   5.     ├─base::withRestarts(...)
#>   6.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
#>   7.     │   └─base (local) doWithOneRestart(return(expr), restart)
#>   8.     └─vctrs::stop_incompatible_type(...)
#>   9.       └─vctrs:::stop_incompatible(...)
#>  10.         └─vctrs:::stop_vctrs(...)
#>  11.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)


# The unique node names
nodes <- data.frame(
  name=c(as.character(links$source), as.character(links$target)) %>% 
    unique()
)
#> Error in eval(expr, envir, enclos): object 'links' not found


# Create id numbers
links$IDsource <- match(links$source, nodes$name)-1 
#> Error in eval(expr, envir, enclos): object 'links' not found
links$IDtarget <- match(links$target, nodes$name)-1
#> Error in eval(expr, envir, enclos): object 'links' not found


# plot
######
p <- sankeyNetwork(Links = links,
                   Nodes = nodes,
                   Source = "IDsource",
                   Target = "IDtarget",
                   Value = "n",
                   NodeID = "name",
                   units = "TWh",
                   fontSize = 12,
                   nodeWidth = 30,
                   iterations = 0)
#> Error in eval(expr, envir, enclos): object 'links' not found
p
#> Error in eval(expr, envir, enclos): object 'p' not found

Created on 2023-10-19 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian BokmÄl_Norway.utf8 
#> [2] LC_CTYPE=Norwegian BokmÄl_Norway.utf8   
#> [3] LC_MONETARY=Norwegian BokmÄl_Norway.utf8
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian BokmÄl_Norway.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.2   forcats_1.0.0     stringr_1.5.0     dplyr_1.1.2      
#>  [5] purrr_1.0.1       readr_2.1.4       tidyr_1.3.0       tibble_3.2.1     
#>  [9] ggplot2_3.4.2     tidyverse_2.0.0   networkD3_0.4     DiagrammeR_1.0.10
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.3       jsonlite_1.8.7     compiler_4.3.1     reprex_2.0.2      
#>  [5] tidyselect_1.2.0   scales_1.2.1       yaml_2.3.7         fastmap_1.1.1     
#>  [9] R6_2.5.1           generics_0.1.3     igraph_1.5.0.1     knitr_1.43        
#> [13] htmlwidgets_1.6.2  visNetwork_2.1.2   munsell_0.5.0      tzdb_0.4.0        
#> [17] pillar_1.9.0       RColorBrewer_1.1-3 rlang_1.1.1        utf8_1.2.3        
#> [21] stringi_1.7.12     xfun_0.39          fs_1.6.3           timechange_0.2.0  
#> [25] cli_3.6.1          withr_2.5.0        magrittr_2.0.3     digest_0.6.33     
#> [29] grid_4.3.1         rstudioapi_0.15.0  hms_1.1.3          lifecycle_1.0.3   
#> [33] vctrs_0.6.3        evaluate_0.21      glue_1.6.2         fansi_1.0.4       
#> [37] colorspace_2.1-0   pacman_0.5.1       rmarkdown_2.23     tools_4.3.1       
#> [41] pkgconfig_2.0.3    htmltools_0.5.5
1 Like

@shoaibraee hello, check if you renamed correctly the target and source columns in this command

hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)

When we use bind_rows to combine datasets, their columns must have the same class. On age_hosp_links target column is character and on hosp_out_links target column is numeric.

I feel that maybe you changed the source and target column names when you renamed the hosp_out_links

3 Likes

Hello,

@lnielsen is correct, if you look at the following error message:

#> Error in `bind_rows()`:
#> ! Can't combine `..1$target` <character> and `..2$target` <integer>.
#> Backtrace:

It says exactly that, target is a character in your first dataset age_hosp_links and an integer in your second dataset hosp_out_links. You will nee to align these types before you can bind the data together.

All the best,

Tim

2 Likes

%
select(HF_Visited, age_group) %>%
count(HF_Visited , age_group) %>%
rename(source = age_group, # re-name
target = HF_Visited)

hosp_out_links ← test_data %>%
drop_na(age_group) %>%
select(HF_Visited, table2_crs8) %>%
count(HF_Visited, table2_crs8) %>%
rename(source = HF_Visited, # re-name
target = table2_crs8)

[/quote]
1 Like

Hello,

The issue occurs here:

# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)

hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)

In the age_hosp_links data, you define target as HF_visited which is a character and in the hosp_out_links data, you define target as table2_crs8 which is an integer. For that reason, bind_rows cannot combine the data as the types differ.

All the best,

Tim

2 Likes

I so appreciate your input. Its just i am not sure how to fix it. On my end bind_rows worked. But the nodes did not pick any unique.
Can you suggest a solution based on the problem you have seen? Thanks

1 Like

Hello,

This code should work, note that I have changed your integers to character by using the as.character function:

# Loading packages (lubridate is included in tidyverse now)
library(tidyverse)

test_data <- data.frame(
    stringsAsFactors = FALSE,
    age_group = as.character(c(2L, 1L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L)),
    HF_Visited = c(
        "Regional Hospital",
        "Regional Hospital",
        "Regional Hospital",
        "Dispensary",
        "Regional Hospital",
        "Dispensary",
        "Private Hospital",
        "District Hospital",
        "Private Hospital",
        "Private Hospital"
    ),
    table2_crs8 = as.character(c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L))
) |>
    as_tibble()

# counts by hospital and age category
age_hosp_links <- test_data |>
    drop_na(age_group) |>
    count(HF_Visited, age_group) |>
    rename(source = age_group,
                 target = HF_Visited)

hosp_out_links <- test_data |>
    drop_na(age_group) |>
    count(HF_Visited, table2_crs8) |>
    rename(source = HF_Visited,
                 target = table2_crs8)

# combine links
links <- bind_rows(age_hosp_links, hosp_out_links)

# The unique node names
nodes <- data.frame(
    name=c(as.character(links$source), as.character(links$target)) %>% 
        unique()
)

# Create id numbers
links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1

Created on 2023-10-21 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
#>  [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
#>  [9] ggplot2_3.4.3   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    knitr_1.44        munsell_0.5.0     R.cache_0.16.0   
#> [13] tzdb_0.4.0        pillar_1.9.0      R.utils_2.12.2    rlang_1.1.1      
#> [17] utf8_1.2.3        stringi_1.7.12    xfun_0.40         fs_1.6.3         
#> [21] timechange_0.2.0  cli_3.6.1         withr_2.5.0       magrittr_2.0.3   
#> [25] digest_0.6.33     grid_4.3.1        rstudioapi_0.15.0 hms_1.1.3        
#> [29] lifecycle_1.0.3   R.methodsS3_1.8.2 R.oo_1.25.0       vctrs_0.6.3      
#> [33] evaluate_0.21     glue_1.6.2        styler_1.10.2     fansi_1.0.4      
#> [37] colorspace_2.1-0  rmarkdown_2.25    tools_4.3.1       pkgconfig_2.0.3  
#> [41] htmltools_0.5.6

All the best,

Tim

1 Like

Hi Tim @machupovirus @lnielsen
I want to thank you for such an awesome guidance. Very helpful and I reached the solution!

I want to add another layer of variables (for example occupation after age group) between connections and tried to read about the package networkD3. I could not find much in this package information. If you know how to do it, please do let me know. You can add a fictious data layer and I will adapt.

Regardless, Thanks a million!

2 Likes

Hi,

Something like this should work:

# Loading packages (lubridate is included in tidyverse now)
library(tidyverse)
library(networkD3)

# Simulating data
test_data <- data.frame(
    stringsAsFactors = FALSE,
    age_group = sample(
        x = c("0-29", "30-59", "60-89", "90+"),
        size = 10,
        replace = TRUE
    ),
    sex = sample(
        x = c("Female", "Male"),
        size = 10,
        replace = TRUE
    ),
    HF_Visited = sample(
        c("Regional Hospital", "Dispensary"),
        size = 10,
        replace = TRUE
    ),
    table2_crs8 = sample(c("0", "1"), size = 10, replace = TRUE)
) |>
    as_tibble()

# Generatinig links
age_sex_links <- test_data |>
    count(age_group, sex) |>
    rename(source = sex,
                 target = age_group)

age_hosp_links <- test_data |>
    count(HF_Visited, age_group) |>
    rename(source = age_group,
                 target = HF_Visited)

hosp_out_links <- test_data |>
    count(HF_Visited, table2_crs8) |>
    rename(source = HF_Visited,
                 target = table2_crs8)

# Combining links
links <- bind_rows(age_sex_links, age_hosp_links, hosp_out_links)

# Unique node names
nodes <- data.frame(name = c(as.character(links$source), as.character(links$target)) |>
                                            unique())

# Create id numbers
links$IDsource <- match(links$source, nodes$name) - 1
links$IDtarget <- match(links$target, nodes$name) - 1

p <- sankeyNetwork(
    Links = links,
    Nodes = nodes,
    Source = "IDsource",
    Target = "IDtarget",
    Value = "n",
    NodeID = "name",
    units = "TWh",
    fontSize = 12,
    nodeWidth = 30,
    iterations = 0
)
#> Links is a tbl_df. Converting to a plain data frame.

Created on 2023-10-24 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] networkD3_0.4   lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0  
#>  [5] dplyr_1.1.3     purrr_1.0.2     readr_2.1.4     tidyr_1.3.0    
#>  [9] tibble_3.2.1    ggplot2_3.4.4   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    igraph_1.5.1      knitr_1.44        htmlwidgets_1.6.2
#> [13] munsell_0.5.0     R.cache_0.16.0    tzdb_0.4.0        pillar_1.9.0     
#> [17] R.utils_2.12.2    rlang_1.1.1       utf8_1.2.4        stringi_1.7.12   
#> [21] xfun_0.40         fs_1.6.3          timechange_0.2.0  cli_3.6.1        
#> [25] withr_2.5.1       magrittr_2.0.3    digest_0.6.33     grid_4.3.1       
#> [29] rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.3   R.methodsS3_1.8.2
#> [33] R.oo_1.25.0       vctrs_0.6.4       evaluate_0.22     glue_1.6.2       
#> [37] styler_1.10.2     fansi_1.0.5       colorspace_2.1-0  rmarkdown_2.25   
#> [41] tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.6.1

As you can see, it’s really just about extending the idea of just defining the links and then adding them to the data frame of links.

All the best,

Tim

1 Like

You @machupovirus are a legend! :slight_smile: Thank you for walking me through the process, I will use this to replicate.
I tried to read how to export the figure in Microsoft Word and keep the dynamic properties (such as when we hover over the connections in the Sankey diagram and it shows case counts). At some point, it felt there was an option but it did not work as the Streamgraph package did not work. I am sharing it below for you.

I could not find similar in our R handbook. Please do see if there are any possibilities as I need to reflect numbers in different links in the figure, or How would you advise?

It is so great how so much information can be conveyed through this diagram, thank you again!

1 Like

Hello,

You definitely cannot retain the interactivity of D3 plots within a static Microsoft Word document, however, you could consider something like Shiny if you want to do this. Albeit, that is a whole other can of worms so I don’t know if you would really want to get into that.

I think your best bet is to use a non-interactive plot if you are trying to display it within a Word document.

All the best,

Tim

1 Like

Hello Tim @machupovirus
Thats well understood.
I have a request about the Sankey diagram. Once I have the final R script and run Sankeynetwork functions, is it possible to extract this data in a tabular form (in addition to the graphics)?
Considering that I am not able to add the dynamic figure, perhaps i can add a table with the graphics (thanks to you that seems a doable task now)! That may allow showing links in tabular as the frequencies/counts have to be presented.
Thanks so much

1 Like

Hello,

It’s not clear what exact form you would like the data to be in, for example, you have access to the links data which is in the form the sankeyNetwork function requires.

Another option I can think of is to link the data together in the following way:

# Loading packages (lubridate is included in tidyverse now)
library(tidyverse)
library(networkD3)

# Simulating data
test_data <- data.frame(
    stringsAsFactors = FALSE,
    age_group = sample(
        x = c("0-29", "30-59", "60-89", "90+"),
        size = 10,
        replace = TRUE
    ),
    sex = sample(
        x = c("Female", "Male"),
        size = 10,
        replace = TRUE
    ),
    HF_Visited = sample(
        c("Regional Hospital", "Dispensary"),
        size = 10,
        replace = TRUE
    ),
    table2_crs8 = sample(c("0", "1"), size = 10, replace = TRUE)
) |>
    as_tibble()

# Generatinig links
age_sex_links <- test_data |>
    count(age_group, sex) |>
    rename(source = sex,
                 target = age_group)

age_hosp_links <- test_data |>
    count(HF_Visited, age_group) |>
    rename(source = age_group,
                 target = HF_Visited)

hosp_out_links <- test_data |>
    count(HF_Visited, table2_crs8) |>
    rename(source = HF_Visited,
                 target = table2_crs8)

# Tabular data
tab_data <- age_sex_links |>
    inner_join(age_hosp_links,
                         by = join_by(target == source),
                         relationship = "many-to-many") |>
    inner_join(
        hosp_out_links,
        by = join_by(target.y == source),
        relationship = "many-to-many"
    ) |>
    select(
        sex = source,
        n_1 = n.x,
        age_group = target.x,
        n_2 = n.y,
        hf_visited = target.y,
        n_3 = n,
        table2_crs8 = target.y.y
    )

tab_data
#> # A tibble: 16 × 7
#>    sex      n_1 age_group   n_2 hf_visited          n_3 table2_crs8
#>    <chr>  <int> <chr>     <int> <chr>             <int> <chr>      
#>  1 Female     2 0-29          1 Dispensary            3 0          
#>  2 Female     2 0-29          1 Dispensary            1 1          
#>  3 Female     2 0-29          2 Regional Hospital     6 0          
#>  4 Male       1 0-29          1 Dispensary            3 0          
#>  5 Male       1 0-29          1 Dispensary            1 1          
#>  6 Male       1 0-29          2 Regional Hospital     6 0          
#>  7 Female     1 30-59         2 Dispensary            3 0          
#>  8 Female     1 30-59         2 Dispensary            1 1          
#>  9 Female     1 30-59         1 Regional Hospital     6 0          
#> 10 Male       2 30-59         2 Dispensary            3 0          
#> 11 Male       2 30-59         2 Dispensary            1 1          
#> 12 Male       2 30-59         1 Regional Hospital     6 0          
#> 13 Male       1 60-89         1 Dispensary            3 0          
#> 14 Male       1 60-89         1 Dispensary            1 1          
#> 15 Female     1 90+           3 Regional Hospital     6 0          
#> 16 Male       2 90+           3 Regional Hospital     6 0

Created on 2023-11-01 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] networkD3_0.4   lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0  
#>  [5] dplyr_1.1.3     purrr_1.0.2     readr_2.1.4     tidyr_1.3.0    
#>  [9] tibble_3.2.1    ggplot2_3.4.4   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    igraph_1.5.1      knitr_1.44        htmlwidgets_1.6.2
#> [13] munsell_0.5.0     R.cache_0.16.0    tzdb_0.4.0        pillar_1.9.0     
#> [17] R.utils_2.12.2    rlang_1.1.1       utf8_1.2.4        stringi_1.7.12   
#> [21] xfun_0.40         fs_1.6.3          timechange_0.2.0  cli_3.6.1        
#> [25] withr_2.5.1       magrittr_2.0.3    digest_0.6.33     grid_4.3.1       
#> [29] rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.3   R.methodsS3_1.8.2
#> [33] R.oo_1.25.0       vctrs_0.6.4       evaluate_0.22     glue_1.6.2       
#> [37] styler_1.10.2     fansi_1.0.5       colorspace_2.1-0  rmarkdown_2.25   
#> [41] tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.6.1

All the best,

Tim

Hi,
Thank you for looking into it.
Actually I was hoping to have data just like it shows up in sankeyNetwork function. This may become the alternative of not having a dynamic sankey diagram (that could show counts of connecting between two links, when we hover over the diagram within R).
So you are right, the data within Sankey fucntion would be great to tabulate (and reflect the “count summary” of connection between each link. Can you please see if it is possible?
Thank you!

1 Like

Hello,

The code I provided above creates a table that mirrors the look of the Sankey diagram, alternatively, you can just use the links data that was created in the previous code.

All the best,

Tim

1 Like

Thats so helpful!
The links data shows counts and connections. The other tibble looked more elegant and nicer format, just it is not reflecting the connections/links in summary form as the links reflects.
Well Tim, can’t thank you enough! All the best,

1 Like