Language of the R report

Hello,

I would appreciate it if anyone could share their experience with producing an R Markdown (Rmd) report in a language different from the language used in the data frame. What are the best practices in such situations?

Here is what I have tried so far:

  1. I translated the original (Russian) data frame into English using an online tool and used it to generate the report. This works, but now I also need to produce the same report in the original language for other partners

    And exploring below options:

  2. Is it possible to generate the Rmd report in English and then translate it into another language, including axis labels and values in the plots? Or, alternatively, generate the report in the original language of the data frame and then translate the final output into English?

  3. Would it make sense to use recode() (or similar functions) to map variables and values into two languages in order to produce bilingual visuals? This might not be ideal, as the labels may not display properly or could make the visuals harder to read.

Best practice is: don’t translate the data frame. Keep the data as-is, and localize the “presentation layer” (titles, captions, axis labels, legend labels, table headers, factor labels) via a translation dictionary + a language parameter.

That gives you repeatable, auditable reports in multiple languages without duplicating pipelines.

Recommended patterns

1) Parameterize the report by language

In your YAML:

---
title: "Surveillance report"
params:
  lang: "en"   # or "ru"
---

Then run twice (or via a script): once with lang="en", once with lang="ru".

2) Use a translation dictionary (not recode() scattered everywhere)

Create a small table (CSV/YAML/R list) mapping keys → en/ru strings.

Example in R:

i18n <- list(
  en = list(title_cases = "Cases over time", x_date = "Date", y_cases = "Cases"),
  ru = list(title_cases = "Случаи по времени", x_date = "Дата", y_cases = "Случаи")
)

tr <- function(key, lang = params$lang) i18n[[lang]][[key]]

Then in ggplot:

ggplot(df, aes(date, cases)) +
  geom_col() +
  labs(
    title = tr("title_cases"),
    x     = tr("x_date"),
    y     = tr("y_cases")
  )

More on i18n here: GitHub - Appsilon/shiny.i18n: Shiny applications internationalization made easy

3) Translate categorical values only when plotting/reporting

For legend labels and tables, translate factor labels at render time:

labels_case_status <- list(
  en = c("confirmed" = "Confirmed", "probable" = "Probable"),
  ru = c("confirmed" = "Подтверждено", "probable" = "Вероятно")
)

df_plot <- df %>%
  mutate(status_lab = dplyr::recode(status, !!!labels_case_status[[params$lang]]))

ggplot(df_plot, aes(date, fill = status_lab)) +
  geom_bar() +
  labs(fill = tr("status"))

This keeps your “analysis variables” stable (status) and only changes the display labels.

Avoid “translate the final PDF/Word”

Translating the rendered output is usually the worst option:

  • plots become images (hard to translate axis labels cleanly),

  • tables and numbers can be mangled,

  • no reproducible audit trail.

If you must, do it only for narrative text, not for charts/tables.

What I’d do in practice

  • One Rmd/Quarto source

  • params$lang

  • A translation dictionary file (CSV/YAML) committed to the repo

  • A helper tr() function using i18n

  • All labels/titles/headers come from tr()

  • Factor/value label translation happens in a prep_for_reporting(lang) step

Best,

Luis

1 Like

Thanks a lot, Luis
I will follow the proposed solutions and trust that they will help resolve the issue. This also confirms what I believed to be important — to keep the original dataset unchanged.