Stacked bars to clustered bars

I am trying to reproduce this figure in R using ggplot:

Unfortunately, I am unable to find a way to get the bars in my R chart to cluster instead of stack, as they do in the original.

Steps taken so far

  • I have searched many forums and standard R help pages (i.e., stackoverflow, geeksforgeeks, Applied Epi Handbook, ggplot.tidyverse.org etc.), have consulted with my R literate colleagues and even asked Co-Pilot AI for help (we cannot access Chat GPT at my workplace) and still no answers.

My code so far is this:

scale_factor <- 300


ggplot(fig_1_data, aes(x = year)) +
geom_bar(aes(y = count, fill = "Count"), stat = "identity", position = position_dodge(width = 0.8)) +
geom_bar(aes(y = ytd_count, fill = "YTD Count"), stat = "identity", position = position_dodge(width = 0.8)) +
geom_line(aes(y = annual_rate * 300, color = "Annual Rate"), size = 1) +  # Adjust scaling factor as needed
geom_line(aes(y = ytd_rate * 300, color = "YTD Rate"), size = 1) +  # Adjust scaling factor as needed
geom_label(aes(y = annual_rate * 300, label = annual_rate, color = "Annual Rate"), vjust = -0.5, fill = "white") +
geom_label(aes(y = ytd_rate * 300, label = ytd_rate, color = "YTD Rate"), vjust = -0.5, fill = "white") +
scale_y_continuous(
  name = "Number of notifications",
  expand = c(0,0), 
  limits = c(0, 3000), 
  breaks = seq(0, 3000, by =500), 
  sec.axis = sec_axis(~./300, name = "Notification rate per 100,000 population")  # Adjust scaling factor as needed
) +
scale_fill_manual(values = c("Count" = basic_fill, "YTD Count" = basic_line)) +
scale_color_manual(values = c("Annual Rate" = "#4C535A", "YTD Rate" = "#A10000")) +
labs(x = "Year", fill = "Legend", color = "Legend") +
theme_minimal() +
theme(
  axis.title.y.right = element_text(color = "black"),
  axis.text.y.right = element_text(color = "black"), 
  axis.text.y.left = element_text(color = "black"),
  axis.text.x.bottom = element_text (color = "black"),
  axis.line.x = element_line(colour = "black", linewidth = 0.2), 
  axis.line.y.left = element_line(colour = "black", linewidth = 0.2), 
  axis.line.y.right= element_line(colour = "black", linewidth = 0.2), 
  panel.background = element_blank(),
  panel.grid.major = element_blank(), 
  panel.grid.minor = element_blank()
)+                                   # remove buffer zones from x and y axes
scale_x_continuous(expand = c(0,0),
                   breaks = breaks_width(2)) 

And this is a sample of the underlying data:

tibble::tribble(
    ~year, ~count, ~ytd_count, ~annual_rate, ~ytd_rate,
     2009,   1554,       1457,          7.2,       6.7,
     2010,   1640,       1520,          7.4,       6.9,
     2011,   1883,       1744,          8.4,       7.8,
     2012,   1823,       1714,            8,       7.5,
     2013,   1552,       1433,          6.7,       6.2
    )

Any help greatly appreciated!


This is what the plot looks like now (the forum post would only allow me to post one image)

Hi @lizzieg1990, the position_dodge argument isn’t functioning as expected due to the current data structure; ideally, the fill aesthetic should map to a variable rather than a static “name.” In my view, the best approach is to reshape the data to a long format. I asked ChatGPT to implement this approach, and the output looks correct. While I haven’t had the time to thoroughly verify and validate the code, it seems like a solid starting point.

Heres the generated code:

# Reshape the data
fig_1_data_long <- fig_1_data %>%
  pivot_longer(
    cols = c(count, ytd_count, annual_rate, ytd_rate),
    names_to = "Type",
    values_to = "Value"
  ) %>%
  mutate(
    scaled_value = ifelse(Type %in% c("annual_rate", "ytd_rate"), Value * scale_factor, Value),
    Type = recode(
      Type, 
      count = "Count", 
      ytd_count = "YTD Count", 
      annual_rate = "Annual Rate", 
      ytd_rate = "YTD Rate"
    ),
    Category = case_when(
      Type %in% c("Count", "YTD Count") ~ "Bar",
      Type %in% c("Annual Rate", "YTD Rate") ~ "Line"
    )
  )


ggplot(fig_1_data_long, aes(x = factor(year))) +
  # Bars
  geom_bar(
    data = filter(fig_1_data_long, Category == "Bar"),
    aes(y = scaled_value, fill = Type),
    stat = "identity",
    position = position_dodge(width = .9)
  ) +
  # Lines
  geom_line(
    data = filter(fig_1_data_long, Category == "Line"),
    aes(y = scaled_value, color = Type, group = Type),
    size = 1
  ) +
  # Labels for lines
  geom_label(
    data = filter(fig_1_data_long, Category == "Line"),
    aes(y = scaled_value, label = Value, color = Type),
    vjust = -0.5,
    fill = "white"
  ) +
  # Y-axis scaling
  scale_y_continuous(
    name = "Number of notifications",
    expand = c(0, 0),
    limits = c(0, 3000),
    breaks = seq(0, 3000, by = 500),
    sec.axis = sec_axis(~./scale_factor, name = "Notification rate per 100,000 population")
  ) +
  # Manual color and fill scales
  scale_fill_manual(
    values = c("Count" = "#0072B2", "YTD Count" = "#E69F00"),
    name = "Legend"
  ) +
  scale_color_manual(
    values = c("Annual Rate" = "#4C535A", "YTD Rate" = "#A10000"),
    name = "Legend"
  ) +
  # Labels and theme
  labs(x = "Year", fill = "Legend", color = "Legend") +
  theme_minimal() +
  theme(
    axis.title.y.right = element_text(color = "black"),
    axis.text.y.right = element_text(color = "black"),
    axis.text.y.left = element_text(color = "black"),
    axis.text.x = element_text(color = "black"),
    axis.line.x = element_line(colour = "black", linewidth = 0.2),
    axis.line.y.left = element_line(colour = "black", linewidth = 0.2),
    axis.line.y.right = element_line(colour = "black", linewidth = 0.2),
    panel.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )


Lucca you are an absolute star - this has worked!

Thank you so much :slight_smile:

1 Like