Error message while creating Alluvial/Sankey Diagrams

shoaibraee · October 18, 2023, 6:15pm

Hello Applied Epi community colleagues,
I need bit of your help look at my script please. I am trying create an Alluvial/Sankey Diagrams from our handbook here 35 Diagrams and charts | The Epidemiologist R Handbook.

Although i feel data is properly available, When i run the first and second chunk an error would say as follows:

Error in rename(., source = age_group, target = HF_Visited) :
unused arguments (source = age_group, target = HF_Visited)

Please help me fix this! Thank you

# Test code ---------------------------------------------------------------

pacman::p_load(rio, lubridate, datapasta, reprex, tidyverse) 

test_data <-  Paper2.2_Costing %>% 
  head(10) %>% 
  select(age_group , HF_Visited , table2_crs8)
#> Error in eval(expr, envir, enclos): object 'Paper2.2_Costing' not found

dpasta(test_data)
#> Error in eval(expr, envir, enclos): object 'test_data' not found


data.frame(
  stringsAsFactors = FALSE,
         age_group = c(2L, 1L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L),
                       HF_Visited = c("Regional Hospital","Regional Hospital",
                                      "Regional Hospital","Dispensary",
                                      "Regional Hospital","Dispensary",
                                      "Private Hospital","District Hospital",
                                      "Private Hospital","Private Hospital"),
       table2_crs8 = c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L)
               )
#>    age_group        HF_Visited table2_crs8
#> 1          2 Regional Hospital           1
#> 2          1 Regional Hospital           1
#> 3          3 Regional Hospital           1
#> 4          2        Dispensary           1
#> 5          2 Regional Hospital           0
#> 6          4        Dispensary           1
#> 7          2  Private Hospital           1
#> 8          3 District Hospital           1
#> 9          2  Private Hospital           1
#> 10         1  Private Hospital           1

pacman::p_load(
  DiagrammeR,     # for flow diagrams
  networkD3,      # For alluvial/Sankey diagrams
  tidyverse)      # data management and visualization



# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)
#> Error in eval(expr, envir, enclos): object 'test_data' not found



hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)
#> Error in eval(expr, envir, enclos): object 'test_data' not found


# combine links
links <- bind_rows(age_hosp_links, hosp_out_links) 
#> Error in eval(expr, envir, enclos): object 'age_hosp_links' not found


# The unique node names
nodes <- data.frame(
  name=c(as.character(links$source), as.character(links$target)) %>% 
    unique()
)
#> Error in eval(expr, envir, enclos): object 'links' not found


# Create id numbers
links$IDsource <- match(links$source, nodes$name)-1 
#> Error in eval(expr, envir, enclos): object 'links' not found
links$IDtarget <- match(links$target, nodes$name)-1
#> Error in eval(expr, envir, enclos): object 'links' not found


# plot
######
p <- sankeyNetwork(Links = links,
                   Nodes = nodes,
                   Source = "IDsource",
                   Target = "IDtarget",
                   Value = "n",
                   NodeID = "name",
                   units = "TWh",
                   fontSize = 12,
                   nodeWidth = 30,
                   iterations = 0)
#> Error in eval(expr, envir, enclos): object 'links' not found
p
#> Error in eval(expr, envir, enclos): object 'p' not found

^{Created on 2023-10-18 with reprex v2.0.2}

Session info

sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.utf8 
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.utf8   
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.utf8
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian Bokmål_Norway.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] networkD3_0.4     DiagrammeR_1.0.10 forcats_1.0.0     stringr_1.5.0    
#>  [5] dplyr_1.1.2       purrr_1.0.1       readr_2.1.4       tidyr_1.3.0      
#>  [9] tibble_3.2.1      ggplot2_3.4.2     tidyverse_2.0.0   reprex_2.0.2     
#> [13] datapasta_3.1.0   lubridate_1.9.2   rio_0.5.29       
#> 
#> loaded via a namespace (and not attached):
#>  [1] utf8_1.2.3         generics_0.1.3     stringi_1.7.12     hms_1.1.3         
#>  [5] digest_0.6.33      magrittr_2.0.3     RColorBrewer_1.1-3 evaluate_0.21     
#>  [9] grid_4.3.1         timechange_0.2.0   fastmap_1.1.1      jsonlite_1.8.7    
#> [13] cellranger_1.1.0   zip_2.3.0          fansi_1.0.4        scales_1.2.1      
#> [17] cli_3.6.1          rlang_1.1.1        visNetwork_2.1.2   munsell_0.5.0     
#> [21] withr_2.5.0        yaml_2.3.7         tools_4.3.1        tzdb_0.4.0        
#> [25] colorspace_2.1-0   pacman_0.5.1       curl_5.0.1         vctrs_0.6.3       
#> [29] R6_2.5.1           lifecycle_1.0.3    htmlwidgets_1.6.2  fs_1.6.3          
#> [33] foreign_0.8-84     pkgconfig_2.0.3    pillar_1.9.0       openxlsx_4.2.5.2  
#> [37] gtable_0.3.3       data.table_1.14.8  glue_1.6.2         Rcpp_1.0.11       
#> [41] haven_2.5.3        xfun_0.39          tidyselect_1.2.0   rstudioapi_0.15.0 
#> [45] knitr_1.43         igraph_1.5.0.1     htmltools_0.5.5    rmarkdown_2.23    
#> [49] compiler_4.3.1     readxl_1.4.3

machupovirus · October 18, 2023, 10:30pm

Hello,

The data you used at the very beginning of your code, Paper2.2_Costing, is not defined anywhere so nothing dependent on it will work. You need to define this data before proceeding any further.

All the best,

Tim

shoaibraee · October 19, 2023, 3:32am

Hi,
Please see if this data works.
When i run code by line, its at “count” step when code chunks defining “age_hosp_links” and “hosp_out_links” won’t work.
But not sure why that error does not copied in reprex.
Thank you!

test_data <- data.frame(
  stringsAsFactors = FALSE,
         age_group = c(2L, 1L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L),
                       HF_Visited = c("Regional Hospital","Regional Hospital",
                                      "Regional Hospital","Dispensary",
                                      "Regional Hospital","Dispensary",
                                      "Private Hospital","District Hospital",
                                      "Private Hospital","Private Hospital"),
       table2_crs8 = c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L)
               )

pacman::p_load(
  DiagrammeR,     # for flow diagrams
  networkD3,      # For alluvial/Sankey diagrams
  tidyverse)      # data management and visualization



# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)



hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)


# combine links
links <- bind_rows(age_hosp_links, hosp_out_links) 
#> Error in `bind_rows()`:
#> ! Can't combine `..1$target` <character> and `..2$target` <integer>.
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::bind_rows(age_hosp_links, hosp_out_links)
#>   2. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
#>   3. └─vctrs (local) `<fn>`()
#>   4.   └─vctrs::vec_default_ptype2(...)
#>   5.     ├─base::withRestarts(...)
#>   6.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
#>   7.     │   └─base (local) doWithOneRestart(return(expr), restart)
#>   8.     └─vctrs::stop_incompatible_type(...)
#>   9.       └─vctrs:::stop_incompatible(...)
#>  10.         └─vctrs:::stop_vctrs(...)
#>  11.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)


# The unique node names
nodes <- data.frame(
  name=c(as.character(links$source), as.character(links$target)) %>% 
    unique()
)
#> Error in eval(expr, envir, enclos): object 'links' not found


# Create id numbers
links$IDsource <- match(links$source, nodes$name)-1 
#> Error in eval(expr, envir, enclos): object 'links' not found
links$IDtarget <- match(links$target, nodes$name)-1
#> Error in eval(expr, envir, enclos): object 'links' not found


# plot
######
p <- sankeyNetwork(Links = links,
                   Nodes = nodes,
                   Source = "IDsource",
                   Target = "IDtarget",
                   Value = "n",
                   NodeID = "name",
                   units = "TWh",
                   fontSize = 12,
                   nodeWidth = 30,
                   iterations = 0)
#> Error in eval(expr, envir, enclos): object 'links' not found
p
#> Error in eval(expr, envir, enclos): object 'p' not found

^{Created on 2023-10-19 with reprex v2.0.2}

Session info

sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.utf8 
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.utf8   
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.utf8
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian Bokmål_Norway.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.2   forcats_1.0.0     stringr_1.5.0     dplyr_1.1.2      
#>  [5] purrr_1.0.1       readr_2.1.4       tidyr_1.3.0       tibble_3.2.1     
#>  [9] ggplot2_3.4.2     tidyverse_2.0.0   networkD3_0.4     DiagrammeR_1.0.10
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.3       jsonlite_1.8.7     compiler_4.3.1     reprex_2.0.2      
#>  [5] tidyselect_1.2.0   scales_1.2.1       yaml_2.3.7         fastmap_1.1.1     
#>  [9] R6_2.5.1           generics_0.1.3     igraph_1.5.0.1     knitr_1.43        
#> [13] htmlwidgets_1.6.2  visNetwork_2.1.2   munsell_0.5.0      tzdb_0.4.0        
#> [17] pillar_1.9.0       RColorBrewer_1.1-3 rlang_1.1.1        utf8_1.2.3        
#> [21] stringi_1.7.12     xfun_0.39          fs_1.6.3           timechange_0.2.0  
#> [25] cli_3.6.1          withr_2.5.0        magrittr_2.0.3     digest_0.6.33     
#> [29] grid_4.3.1         rstudioapi_0.15.0  hms_1.1.3          lifecycle_1.0.3   
#> [33] vctrs_0.6.3        evaluate_0.21      glue_1.6.2         fansi_1.0.4       
#> [37] colorspace_2.1-0   pacman_0.5.1       rmarkdown_2.23     tools_4.3.1       
#> [41] pkgconfig_2.0.3    htmltools_0.5.5

lnielsen · October 19, 2023, 11:39am

@shoaibraee hello, check if you renamed correctly the target and source columns in this command

hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)

When we use bind_rows to combine datasets, their columns must have the same class. On age_hosp_links target column is character and on hosp_out_links target column is numeric.

I feel that maybe you changed the source and target column names when you renamed the hosp_out_links

machupovirus · October 19, 2023, 12:46pm

Hello,

@lnielsen is correct, if you look at the following error message:

#> Error in `bind_rows()`:
#> ! Can't combine `..1$target` <character> and `..2$target` <integer>.
#> Backtrace:

It says exactly that, target is a character in your first dataset age_hosp_links and an integer in your second dataset hosp_out_links. You will nee to align these types before you can bind the data together.

All the best,

Tim

shoaibraee · October 19, 2023, 3:49pm

shoaibraee:

@machupovirus @lnielsen
This is greatly appreciated. 
Can you please check if the following chunk worked on your end? I cannot proceed with this and gives me an error (even before I reach bind step). 

Error in rename(., source = age_group, target = HF_Visited) : 
  unused arguments (source = age_group, target = HF_Visited)

# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)



hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)



Hello,

@lnielsen is correct, if you look at the following error message:

#> Error in bind_rows():
#> ! Can’t combine ..1$target and ..2$target .
#> Backtrace:

It says exactly that, target is a character in your first dataset `age_hosp_links` and an integer in your second dataset `hosp_out_links`. You will nee to align these types before you can bind the data together.

All the best,

Tim

%
select(HF_Visited, age_group) %>%
count(HF_Visited , age_group) %>%
rename(source = age_group, # re-name
target = HF_Visited)

hosp_out_links ← test_data %>%
drop_na(age_group) %>%
select(HF_Visited, table2_crs8) %>%
count(HF_Visited, table2_crs8) %>%
rename(source = HF_Visited, # re-name
target = table2_crs8)

[/quote]

machupovirus · October 19, 2023, 10:56pm

Hello,

The issue occurs here:

# counts by hospital and age category
age_hosp_links <- test_data  %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, age_group) %>% 
  count(HF_Visited , age_group) %>% 
  rename(source = age_group,           # re-name
         target = HF_Visited)

hosp_out_links <- test_data %>% 
  drop_na(age_group) %>% 
  select(HF_Visited, table2_crs8) %>% 
  count(HF_Visited, table2_crs8) %>% 
  rename(source = HF_Visited,       # re-name
         target = table2_crs8)

In the age_hosp_links data, you define target as HF_visited which is a character and in the hosp_out_links data, you define target as table2_crs8 which is an integer. For that reason, bind_rows cannot combine the data as the types differ.

All the best,

Tim

shoaibraee · October 20, 2023, 2:01pm

I so appreciate your input. Its just i am not sure how to fix it. On my end bind_rows worked. But the nodes did not pick any unique.
Can you suggest a solution based on the problem you have seen? Thanks

machupovirus · October 21, 2023, 11:51pm

Hello,

This code should work, note that I have changed your integers to character by using the as.character function:

# Loading packages (lubridate is included in tidyverse now)
library(tidyverse)

test_data <- data.frame(
    stringsAsFactors = FALSE,
    age_group = as.character(c(2L, 1L, 3L, 2L, 2L, 4L, 2L, 3L, 2L, 1L)),
    HF_Visited = c(
        "Regional Hospital",
        "Regional Hospital",
        "Regional Hospital",
        "Dispensary",
        "Regional Hospital",
        "Dispensary",
        "Private Hospital",
        "District Hospital",
        "Private Hospital",
        "Private Hospital"
    ),
    table2_crs8 = as.character(c(1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L))
) |>
    as_tibble()

# counts by hospital and age category
age_hosp_links <- test_data |>
    drop_na(age_group) |>
    count(HF_Visited, age_group) |>
    rename(source = age_group,
                 target = HF_Visited)

hosp_out_links <- test_data |>
    drop_na(age_group) |>
    count(HF_Visited, table2_crs8) |>
    rename(source = HF_Visited,
                 target = table2_crs8)

# combine links
links <- bind_rows(age_hosp_links, hosp_out_links)

# The unique node names
nodes <- data.frame(
    name=c(as.character(links$source), as.character(links$target)) %>% 
        unique()
)

# Create id numbers
links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1

^{Created on 2023-10-21 with reprex v2.0.2}

Session info

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
#>  [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
#>  [9] ggplot2_3.4.3   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    knitr_1.44        munsell_0.5.0     R.cache_0.16.0   
#> [13] tzdb_0.4.0        pillar_1.9.0      R.utils_2.12.2    rlang_1.1.1      
#> [17] utf8_1.2.3        stringi_1.7.12    xfun_0.40         fs_1.6.3         
#> [21] timechange_0.2.0  cli_3.6.1         withr_2.5.0       magrittr_2.0.3   
#> [25] digest_0.6.33     grid_4.3.1        rstudioapi_0.15.0 hms_1.1.3        
#> [29] lifecycle_1.0.3   R.methodsS3_1.8.2 R.oo_1.25.0       vctrs_0.6.3      
#> [33] evaluate_0.21     glue_1.6.2        styler_1.10.2     fansi_1.0.4      
#> [37] colorspace_2.1-0  rmarkdown_2.25    tools_4.3.1       pkgconfig_2.0.3  
#> [41] htmltools_0.5.6

All the best,

Tim

shoaibraee · October 23, 2023, 3:01pm

Hi Tim @machupovirus @lnielsen
I want to thank you for such an awesome guidance. Very helpful and I reached the solution!

I want to add another layer of variables (for example occupation after age group) between connections and tried to read about the package networkD3. I could not find much in this package information. If you know how to do it, please do let me know. You can add a fictious data layer and I will adapt.

Regardless, Thanks a million!

machupovirus · October 24, 2023, 10:57pm

Hi,

Something like this should work:

# Loading packages (lubridate is included in tidyverse now)
library(tidyverse)
library(networkD3)

# Simulating data
test_data <- data.frame(
    stringsAsFactors = FALSE,
    age_group = sample(
        x = c("0-29", "30-59", "60-89", "90+"),
        size = 10,
        replace = TRUE
    ),
    sex = sample(
        x = c("Female", "Male"),
        size = 10,
        replace = TRUE
    ),
    HF_Visited = sample(
        c("Regional Hospital", "Dispensary"),
        size = 10,
        replace = TRUE
    ),
    table2_crs8 = sample(c("0", "1"), size = 10, replace = TRUE)
) |>
    as_tibble()

# Generatinig links
age_sex_links <- test_data |>
    count(age_group, sex) |>
    rename(source = sex,
                 target = age_group)

age_hosp_links <- test_data |>
    count(HF_Visited, age_group) |>
    rename(source = age_group,
                 target = HF_Visited)

hosp_out_links <- test_data |>
    count(HF_Visited, table2_crs8) |>
    rename(source = HF_Visited,
                 target = table2_crs8)

# Combining links
links <- bind_rows(age_sex_links, age_hosp_links, hosp_out_links)

# Unique node names
nodes <- data.frame(name = c(as.character(links$source), as.character(links$target)) |>
                                            unique())

# Create id numbers
links$IDsource <- match(links$source, nodes$name) - 1
links$IDtarget <- match(links$target, nodes$name) - 1

p <- sankeyNetwork(
    Links = links,
    Nodes = nodes,
    Source = "IDsource",
    Target = "IDtarget",
    Value = "n",
    NodeID = "name",
    units = "TWh",
    fontSize = 12,
    nodeWidth = 30,
    iterations = 0
)
#> Links is a tbl_df. Converting to a plain data frame.

^{Created on 2023-10-24 with reprex v2.0.2}

Session info

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] networkD3_0.4   lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0  
#>  [5] dplyr_1.1.3     purrr_1.0.2     readr_2.1.4     tidyr_1.3.0    
#>  [9] tibble_3.2.1    ggplot2_3.4.4   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    igraph_1.5.1      knitr_1.44        htmlwidgets_1.6.2
#> [13] munsell_0.5.0     R.cache_0.16.0    tzdb_0.4.0        pillar_1.9.0     
#> [17] R.utils_2.12.2    rlang_1.1.1       utf8_1.2.4        stringi_1.7.12   
#> [21] xfun_0.40         fs_1.6.3          timechange_0.2.0  cli_3.6.1        
#> [25] withr_2.5.1       magrittr_2.0.3    digest_0.6.33     grid_4.3.1       
#> [29] rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.3   R.methodsS3_1.8.2
#> [33] R.oo_1.25.0       vctrs_0.6.4       evaluate_0.22     glue_1.6.2       
#> [37] styler_1.10.2     fansi_1.0.5       colorspace_2.1-0  rmarkdown_2.25   
#> [41] tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.6.1

As you can see, it’s really just about extending the idea of just defining the links and then adding them to the data frame of links.

All the best,

Tim

shoaibraee · October 25, 2023, 7:39am

You @machupovirus are a legend! Thank you for walking me through the process, I will use this to replicate.
I tried to read how to export the figure in Microsoft Word and keep the dynamic properties (such as when we hover over the connections in the Sankey diagram and it shows case counts). At some point, it felt there was an option but it did not work as the Streamgraph package did not work. I am sharing it below for you.

I could not find similar in our R handbook. Please do see if there are any possibilities as I need to reflect numbers in different links in the figure, or How would you advise?

It is so great how so much information can be conveyed through this diagram, thank you again!

machupovirus · October 25, 2023, 9:52pm

Hello,

You definitely cannot retain the interactivity of D3 plots within a static Microsoft Word document, however, you could consider something like Shiny if you want to do this. Albeit, that is a whole other can of worms so I don’t know if you would really want to get into that.

I think your best bet is to use a non-interactive plot if you are trying to display it within a Word document.

All the best,

Tim

shoaibraee · October 31, 2023, 8:29pm

Hello Tim @machupovirus
Thats well understood.
I have a request about the Sankey diagram. Once I have the final R script and run Sankeynetwork functions, is it possible to extract this data in a tabular form (in addition to the graphics)?
Considering that I am not able to add the dynamic figure, perhaps i can add a table with the graphics (thanks to you that seems a doable task now)! That may allow showing links in tabular as the frequencies/counts have to be presented.
Thanks so much

machupovirus · November 1, 2023, 10:55pm

Hello,

It’s not clear what exact form you would like the data to be in, for example, you have access to the links data which is in the form the sankeyNetwork function requires.

Another option I can think of is to link the data together in the following way:

# Loading packages (lubridate is included in tidyverse now)
library(tidyverse)
library(networkD3)

# Simulating data
test_data <- data.frame(
    stringsAsFactors = FALSE,
    age_group = sample(
        x = c("0-29", "30-59", "60-89", "90+"),
        size = 10,
        replace = TRUE
    ),
    sex = sample(
        x = c("Female", "Male"),
        size = 10,
        replace = TRUE
    ),
    HF_Visited = sample(
        c("Regional Hospital", "Dispensary"),
        size = 10,
        replace = TRUE
    ),
    table2_crs8 = sample(c("0", "1"), size = 10, replace = TRUE)
) |>
    as_tibble()

# Generatinig links
age_sex_links <- test_data |>
    count(age_group, sex) |>
    rename(source = sex,
                 target = age_group)

age_hosp_links <- test_data |>
    count(HF_Visited, age_group) |>
    rename(source = age_group,
                 target = HF_Visited)

hosp_out_links <- test_data |>
    count(HF_Visited, table2_crs8) |>
    rename(source = HF_Visited,
                 target = table2_crs8)

# Tabular data
tab_data <- age_sex_links |>
    inner_join(age_hosp_links,
                         by = join_by(target == source),
                         relationship = "many-to-many") |>
    inner_join(
        hosp_out_links,
        by = join_by(target.y == source),
        relationship = "many-to-many"
    ) |>
    select(
        sex = source,
        n_1 = n.x,
        age_group = target.x,
        n_2 = n.y,
        hf_visited = target.y,
        n_3 = n,
        table2_crs8 = target.y.y
    )

tab_data
#> # A tibble: 16 × 7
#>    sex      n_1 age_group   n_2 hf_visited          n_3 table2_crs8
#>    <chr>  <int> <chr>     <int> <chr>             <int> <chr>      
#>  1 Female     2 0-29          1 Dispensary            3 0          
#>  2 Female     2 0-29          1 Dispensary            1 1          
#>  3 Female     2 0-29          2 Regional Hospital     6 0          
#>  4 Male       1 0-29          1 Dispensary            3 0          
#>  5 Male       1 0-29          1 Dispensary            1 1          
#>  6 Male       1 0-29          2 Regional Hospital     6 0          
#>  7 Female     1 30-59         2 Dispensary            3 0          
#>  8 Female     1 30-59         2 Dispensary            1 1          
#>  9 Female     1 30-59         1 Regional Hospital     6 0          
#> 10 Male       2 30-59         2 Dispensary            3 0          
#> 11 Male       2 30-59         2 Dispensary            1 1          
#> 12 Male       2 30-59         1 Regional Hospital     6 0          
#> 13 Male       1 60-89         1 Dispensary            3 0          
#> 14 Male       1 60-89         1 Dispensary            1 1          
#> 15 Female     1 90+           3 Regional Hospital     6 0          
#> 16 Male       2 90+           3 Regional Hospital     6 0

^{Created on 2023-11-01 with reprex v2.0.2}

Session info

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] networkD3_0.4   lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0  
#>  [5] dplyr_1.1.3     purrr_1.0.2     readr_2.1.4     tidyr_1.3.0    
#>  [9] tibble_3.2.1    ggplot2_3.4.4   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    igraph_1.5.1      knitr_1.44        htmlwidgets_1.6.2
#> [13] munsell_0.5.0     R.cache_0.16.0    tzdb_0.4.0        pillar_1.9.0     
#> [17] R.utils_2.12.2    rlang_1.1.1       utf8_1.2.4        stringi_1.7.12   
#> [21] xfun_0.40         fs_1.6.3          timechange_0.2.0  cli_3.6.1        
#> [25] withr_2.5.1       magrittr_2.0.3    digest_0.6.33     grid_4.3.1       
#> [29] rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.3   R.methodsS3_1.8.2
#> [33] R.oo_1.25.0       vctrs_0.6.4       evaluate_0.22     glue_1.6.2       
#> [37] styler_1.10.2     fansi_1.0.5       colorspace_2.1-0  rmarkdown_2.25   
#> [41] tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.6.1

All the best,

Tim

shoaibraee · November 2, 2023, 2:53am

Hi,
Thank you for looking into it.
Actually I was hoping to have data just like it shows up in sankeyNetwork function. This may become the alternative of not having a dynamic sankey diagram (that could show counts of connecting between two links, when we hover over the diagram within R).
So you are right, the data within Sankey fucntion would be great to tabulate (and reflect the “count summary” of connection between each link. Can you please see if it is possible?
Thank you!

machupovirus · November 2, 2023, 12:06pm

Hello,

The code I provided above creates a table that mirrors the look of the Sankey diagram, alternatively, you can just use the links data that was created in the previous code.

All the best,

Tim

shoaibraee · November 2, 2023, 8:35pm

Thats so helpful!
The links data shows counts and connections. The other tibble looked more elegant and nicer format, just it is not reflecting the connections/links in summary form as the links reflects.
Well Tim, can’t thank you enough! All the best,