Recoding variable based on a particular condition

,

Hello everyone,

I have a problem trying to convert some value in the dataset. I’d like to put the extension on some values that meet with the particular condition with the value from the other variable in the same dataset.

I want to change my variable indi.id based on a condition. If trip == “s4”, add “_1” after the initial value and if it not, don’t change it.

Here is the sample of my original data:

data.frame(
  stringsAsFactors = FALSE,
                trip = c("s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s3","s3",
                         "s3","s3","s3","s3","s3","s3","s3","s4","s4",
                         "s4","s4","s4","s4","s4","s4","s4","s4","s4","s4",
                         "s4","s4","s4","s4","s4","s4"),
             indi.id = c("s3C16","s3C16","s3C17",
                         "s3C30","s3C30","s3C30","s3C35","s3C35","s3C53",
                         "s3C53","s3C53","s3C53","s3C53","s3F02","s3F02",
                         "s3F04","s3F04","s3F05","s3F06","s3F06","s3F06",
                         "s3F08","s3F08","s3F08","s3F09","s3F09","s3F09",
                         "s3F10","s3F10","s3F10","s3F10","s3F10","s3F10",
                         "s3F10","s3F10","s3F10","s3F10","s3F13","s3F30",
                         "s3F30","s3F30","s3F30","s3Q03","s3R02","s3R02","s3R02",
                         "s3R02","s3R02","s3R02","s3R02","s3R02","s3R02",
                         "s3R03","s3R03","s3R03","s3R03","s3R03","s3R03",
                         "s3R03","s3R03","s3R06","s3R06","s3R06","s3R06",
                         "s3R06","s3R06","s3R07","s3R07","s3R11","s3R11",
                         "s3R11","s3R11","s3R11","s3S10","s3S19","s3S19",
                         "s3S23","s3S23","s3S23","s3S23","s3S34","s3S34",
                         "s3S34","s3S34","s3S34","s3S44","s3S44","s3S44",
                         "s3S44","s3S44","s3S46","s3S46","s3S46","s3S46",
                         "s3S46","s3S46","s3S46","s3S46","s3S46","s3S46","s3S46",
                         "s3S46","s3S46","s3S46","s3S46","s3S46","s3S48",
                         "s3S48","s3S48","s3S50","s3S52","s3S52","s4C09",
                         "s4C20","s4F02","s4F02","s4F04","s4F04","s4F10",
                         "s4D02","s4D04","s4D04","s4D07","s4D08","s4D08",
                         "s4R30","s4R30","s4R30","s4S13","s4S13")
  )

Thank you for your help.

Paisin

1 Like

@Paisin data.frame( is missing at the beggining of the code snippet.

2 Likes

I’ve corrected it.

2 Likes

Hello,

Can you provide an attempted solution for your problem? Doing so will help us understand the problem you are having.

All the best,

Tim

Here is an example of how I would approach this problem:

# loading packages
library(tidyverse)

# creating fake data
fake_data <- data.frame(
  stringsAsFactors = FALSE,
  trip = c(
    "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s3",
    "s3", "s3", "s3", "s3", "s3", "s3", "s3", "s4", "s4",
    "s4", "s4", "s4", "s4", "s4", "s4", "s4", "s4", "s4", "s4",
    "s4", "s4", "s4", "s4", "s4", "s4"
  ),
  indi.id = c(
    "s3C16", "s3C16", "s3C17",
    "s3C30", "s3C30", "s3C30", "s3C35", "s3C35", "s3C53",
    "s3C53", "s3C53", "s3C53", "s3C53", "s3F02", "s3F02",
    "s3F04", "s3F04", "s3F05", "s3F06", "s3F06", "s3F06",
    "s3F08", "s3F08", "s3F08", "s3F09", "s3F09", "s3F09",
    "s3F10", "s3F10", "s3F10", "s3F10", "s3F10", "s3F10",
    "s3F10", "s3F10", "s3F10", "s3F10", "s3F13", "s3F30",
    "s3F30", "s3F30", "s3F30", "s3Q03", "s3R02", "s3R02", "s3R02",
    "s3R02", "s3R02", "s3R02", "s3R02", "s3R02", "s3R02",
    "s3R03", "s3R03", "s3R03", "s3R03", "s3R03", "s3R03",
    "s3R03", "s3R03", "s3R06", "s3R06", "s3R06", "s3R06",
    "s3R06", "s3R06", "s3R07", "s3R07", "s3R11", "s3R11",
    "s3R11", "s3R11", "s3R11", "s3S10", "s3S19", "s3S19",
    "s3S23", "s3S23", "s3S23", "s3S23", "s3S34", "s3S34",
    "s3S34", "s3S34", "s3S34", "s3S44", "s3S44", "s3S44",
    "s3S44", "s3S44", "s3S46", "s3S46", "s3S46", "s3S46",
    "s3S46", "s3S46", "s3S46", "s3S46", "s3S46", "s3S46", "s3S46",
    "s3S46", "s3S46", "s3S46", "s3S46", "s3S46", "s3S48",
    "s3S48", "s3S48", "s3S50", "s3S52", "s3S52", "s4C09",
    "s4C20", "s4F02", "s4F02", "s4F04", "s4F04", "s4F10",
    "s4D02", "s4D04", "s4D04", "s4D07", "s4D08", "s4D08",
    "s4R30", "s4R30", "s4R30", "s4S13", "s4S13"
  )
) |>
  as_tibble()

# mutuating data
fake_data |>
    mutate(indi.id = if_else(trip == "s4", str_c(indi.id, "_1"), indi.id))
#> # A tibble: 130 × 2
#>    trip  indi.id
#>    <chr> <chr>  
#>  1 s3    s3C16  
#>  2 s3    s3C16  
#>  3 s3    s3C17  
#>  4 s3    s3C30  
#>  5 s3    s3C30  
#>  6 s3    s3C30  
#>  7 s3    s3C35  
#>  8 s3    s3C35  
#>  9 s3    s3C53  
#> 10 s3    s3C53  
#> # ℹ 120 more rows

Created on 2023-11-25 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.6.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.3    
#>  [5] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
#>  [9] ggplot2_3.4.4   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4      compiler_4.3.1    reprex_2.0.2      tidyselect_1.2.0 
#>  [5] scales_1.2.1      yaml_2.3.7        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    knitr_1.44        munsell_0.5.0     R.cache_0.16.0   
#> [13] tzdb_0.4.0        pillar_1.9.0      R.utils_2.12.2    rlang_1.1.1      
#> [17] utf8_1.2.4        stringi_1.7.12    xfun_0.40         fs_1.6.3         
#> [21] timechange_0.2.0  cli_3.6.1         withr_2.5.1       magrittr_2.0.3   
#> [25] digest_0.6.33     grid_4.3.1        rstudioapi_0.15.0 hms_1.1.3        
#> [29] lifecycle_1.0.3   R.methodsS3_1.8.2 R.oo_1.25.0       vctrs_0.6.4      
#> [33] evaluate_0.22     glue_1.6.2        styler_1.10.2     fansi_1.0.5      
#> [37] colorspace_2.1-0  rmarkdown_2.25    tools_4.3.1       pkgconfig_2.0.3  
#> [41] htmltools_0.5.6.1

All the best,

Tim

2 Likes

You can use mutate from the dplyr package to modify the indi.id column. It checks each row: if trip equals “s4”, it appends “_1” to indi.id ; otherwise, it leaves indi.id unchanged.

Here’s a template to show you the structure of the code. Make sure to replace your_data with the actual name of your data frame and replace the ... in the c() functions with the rest of your data.

# Load the dplyr package
library(dplyr)

# Your original data
your_data <- data.frame(
  stringsAsFactors = FALSE,
  trip = c("s3", "s3", "s4", ...), # and so on
  indi.id = c("s3C16", "s3C16", "s4C09", ...) # and so on
)

# Modify indi.id based on the condition
modified_data <- your_data %>%
  mutate(indi.id = if_else(trip == "s4", paste0(indi.id, "_1"), indi.id))

Let me know if this works or helps.

3 Likes