Creating new variable for several categorize of an existing variable and a formula

Thank you for posting! Here is an outline of an effective post:

Hello colleagues,

  • I need your help to modify a script where I can create a new variable (for different districts by using a specific formula for each district).

  • In my case, part of the formula for the new variable can remain the same but partly new for each district (i defined it as β€œerror_load”).

  • If I create this variable for each district step by step, one overwrite the previous one and the size of data remains equal to the size of the district in filter.

  • Can you please help me modify the code!

A help ideally within this week would be appreciated. Thank you!

``` r
demo_data <- data.frame(stringsAsFactors = FALSE,
    Location_district = c("District1"  , "District1" , "District3" , "District2",
                               "District1" , "District1" , "District2" , "District2",
                               "District3" , "District3"),
    Test_A   = c(0, 0, 5.14, 0 , 5.14 , 123.42 , 0 , 30.85, 15.42, 0),
    Test_B    = c(2.0571, 3.089, 5.14, 5.14, 5.14, 82.285, 3.08, 15.42, 5.14, 5.144),
    Estimates = c(5,5,7,10,12, 3,6,6,3,6)
              )

# Creating error_loads to be used in the formula for the "new variable" creation below
error_load_district1 <- 0.8
error_load_district2 <- 0.77
error_load_district3 <- 0.28

demo_data %>%  
filter(Location_district == "District1") %>%
mutate(
  Test_A_correction  = Test_A  * (0.14),    
  Test_B_correction  = Test_B  * (0.34),,          
  New_variable = (Estimates) * (0.88) * (error_load_district1) # creating new variable and # Use a different error_load based on the "filter" above
)  %>% 
  
  filter(Location_district == "District2") %>%
  mutate(
    Test_A_correction  = Test_A  * (0.14),    
    Test_B_correction  = Test_B  * (0.34),,          
    New_variable       = (Estimates) * (0.88) * (error_load_district2) # creating new variable and # Use a different error_load based on the "filter" above
    ) %>% 

  filter(Location_district == "District3") %>%
  mutate(
    Test_A_correction  = Test_A  * (0.14),    
    Test_B_correction  = Test_B  * (0.34),,          
    New_variable       = (Estimates) * (0.88) * (error_load_district3) # creating new variable and # Use a different error_load based on the "filter" above
    )       

Created on 2024-08-14 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Norwegian BokmΓ₯l_Norway.utf8
#>  ctype    Norwegian BokmΓ₯l_Norway.utf8
#>  tz       Europe/Oslo
#>  date     2024-08-14
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.1)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.1)
#>  evaluate      0.22    2023-09-29 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.1)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.1)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.1)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.3.1)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.1)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.1)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.1)
#>  rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.3.3)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
#>  withr         2.5.1   2023-09-26 [1] CRAN (R 4.3.1)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.3.1)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Program Files/R/library
#>  [2] C:/Program Files/R/R-4.3.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
1 Like

Hello,

I’m not sure I follow exactly, but I think this is how I would approach the problem:

# loading packages
library(tidyverse)

# creating fake data
fake_data <- data.frame(
    stringsAsFactors = FALSE,
    Location_district = c(
        "District1",
        "District1",
        "District3",
        "District2",
        "District1",
        "District1",
        "District2",
        "District2",
        "District3",
        "District3"
    ),
    Test_A = c(0, 0, 5.14, 0 , 5.14 , 123.42 , 0 , 30.85, 15.42, 0),
    Test_B = c(2.0571, 3.089, 5.14, 5.14, 5.14, 82.285, 3.08, 15.42, 5.14, 5.144),
    Estimates = c(5, 5, 7, 10, 12, 3, 6, 6, 3, 6)
) |>
    as_tibble()

# deriving variable
fake_data |>
    mutate(
        Test_A_correction = Test_A * 0.14,
        Test_B_correction = Test_B * 0.34,
        New_Variable = case_when(
            Location_district == "District1" ~ Estimates * 0.88 * 0.80,
            Location_district == "District2" ~ Estimates * 0.88 * 0.77,
            Location_district == "District3" ~ Estimates * 0.88 * 0.28,
            .default = NA_real_
        )
    )
#> # A tibble: 10 Γ— 7
#>    Location_district Test_A Test_B Estimates Test_A_correction Test_B_correction
#>    <chr>              <dbl>  <dbl>     <dbl>             <dbl>             <dbl>
#>  1 District1           0      2.06         5             0                 0.699
#>  2 District1           0      3.09         5             0                 1.05 
#>  3 District3           5.14   5.14         7             0.720             1.75 
#>  4 District2           0      5.14        10             0                 1.75 
#>  5 District1           5.14   5.14        12             0.720             1.75 
#>  6 District1         123.    82.3          3            17.3              28.0  
#>  7 District2           0      3.08         6             0                 1.05 
#>  8 District2          30.8   15.4          6             4.32              5.24 
#>  9 District3          15.4    5.14         3             2.16              1.75 
#> 10 District3           0      5.14         6             0                 1.75 
#> # β„Ή 1 more variable: New_Variable <dbl>

Created on 2024-08-14 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       macOS Sonoma 14.5
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Toronto
#>  date     2024-08-14
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.3   2024-06-21 [1] RSPM (R 4.4.0)
#>  colorspace    2.1-0   2023-01-23 [1] RSPM (R 4.4.0)
#>  digest        0.6.36  2024-06-23 [1] RSPM (R 4.4.0)
#>  dplyr       * 1.1.4   2023-11-17 [1] RSPM (R 4.4.0)
#>  evaluate      0.24.0  2024-06-10 [1] RSPM (R 4.4.0)
#>  fansi         1.0.6   2023-12-08 [1] RSPM (R 4.4.0)
#>  fastmap       1.2.0   2024-05-15 [1] RSPM (R 4.4.0)
#>  forcats     * 1.0.0   2023-01-29 [1] RSPM (R 4.4.0)
#>  fs            1.6.4   2024-04-25 [1] RSPM (R 4.4.0)
#>  generics      0.1.3   2022-07-05 [1] RSPM (R 4.4.0)
#>  ggplot2     * 3.5.1   2024-04-23 [1] RSPM (R 4.4.0)
#>  glue          1.7.0   2024-01-09 [1] RSPM (R 4.4.0)
#>  gtable        0.3.5   2024-04-22 [1] RSPM (R 4.4.0)
#>  hms           1.1.3   2023-03-21 [1] RSPM (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#>  knitr         1.48    2024-07-07 [1] RSPM (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.4.0)
#>  lubridate   * 1.9.3   2023-09-27 [1] RSPM (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.4.0)
#>  munsell       0.5.1   2024-04-01 [1] RSPM (R 4.4.0)
#>  pillar        1.9.0   2023-03-22 [1] RSPM (R 4.4.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.4.0)
#>  purrr       * 1.0.2   2023-08-10 [1] RSPM (R 4.4.0)
#>  R6            2.5.1   2021-08-19 [1] RSPM (R 4.4.0)
#>  readr       * 2.1.5   2024-01-10 [1] RSPM (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] RSPM (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] RSPM (R 4.4.0)
#>  rmarkdown     2.27    2024-05-17 [1] RSPM (R 4.4.0)
#>  rstudioapi    0.16.0  2024-03-24 [1] RSPM (R 4.4.0)
#>  scales        1.3.0   2023-11-28 [1] RSPM (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.4.0)
#>  stringi       1.8.4   2024-05-06 [1] RSPM (R 4.4.0)
#>  stringr     * 1.5.1   2023-11-14 [1] RSPM (R 4.4.0)
#>  tibble      * 3.2.1   2023-03-20 [1] RSPM (R 4.4.0)
#>  tidyr       * 1.3.1   2024-01-24 [1] RSPM (R 4.4.0)
#>  tidyselect    1.2.1   2024-03-11 [1] RSPM (R 4.4.0)
#>  tidyverse   * 2.0.0   2023-02-22 [1] RSPM (R 4.4.0)
#>  timechange    0.3.0   2024-01-18 [1] RSPM (R 4.4.0)
#>  tzdb          0.4.0   2023-05-12 [1] RSPM (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [1] RSPM (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.4.0)
#>  withr         3.0.0   2024-01-16 [1] RSPM (R 4.4.0)
#>  xfun          0.45    2024-06-16 [1] RSPM (R 4.4.0)
#>  yaml          2.3.9   2024-07-05 [1] RSPM (R 4.4.0)
#> 
#>  [1] /Users/timothychisamore/Library/R/x86_64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

All the best,

Tim

1 Like

Thank you, this worked out very well!

2 Likes