Splitting data frame

hi im new with using Rstudio i’m having trouble with my script. i want create new data set from a source data set with variable from which province/region they are and to export them again as .csv file. here are my initial scripts (kinda use chat gpt on this).

library(dplyr)
library(rio)
library(here)

Import the dataset

lepto ← import(here(“Data”, “lepto_view.csv”))

Column to split by

column_to_split ← “(Current Address)Province”

Ensure the column exists in the dataset

if (!column_to_split %in% colnames(lepto)) {
stop(paste(“Column”, column_to_split, “not found in the dataset”))
}

Split the dataset into a list of data frames by the column

list_of_datasets ← split(lepto, lepto[[column_to_split]])

Export each dataset to a separate CSV file

lapply(names(list_of_datasets), function(province) {

Define the output file path

output_file ← here(“Data”, paste0(“lepto_”, province, “.csv”))

Export the subset

export(list_of_datasets[[province]], output_file)

Hello,

It would be helpful if you could provide additional details and a reproducible example (reprex). From what I understand, you are trying to split data into a list of data by a given variable and then save each of these files and this is how I would approach it:

# loading packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(readr)

# loading data
raw_data <- read_csv(readr_example("chickens.csv"))
#> Rows: 5 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): chicken, sex, motto
#> dbl (1): eggs_laid
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# splitting the data by sex
list_of_data <- raw_data |>
    group_split(sex)

# saving the data
walk(.x = list_of_data, .f = \(x)
         write_csv(
            x = x,
            file = tempfile(
                pattern = "file",
                tmpdir = tempdir(),
                fileext = ".csv"
            )
         ))

Created on 2024-11-21 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       macOS 15.1
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Toronto
#>  date     2024-11-21
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  bit           4.5.0   2024-09-20 [1] RSPM (R 4.4.0)
#>  bit64         4.5.2   2024-09-22 [1] RSPM (R 4.4.0)
#>  cli           3.6.3   2024-06-21 [1] RSPM (R 4.4.0)
#>  crayon        1.5.3   2024-06-20 [1] RSPM (R 4.4.0)
#>  digest        0.6.37  2024-08-19 [1] RSPM (R 4.4.0)
#>  dplyr       * 1.1.4   2023-11-17 [1] RSPM (R 4.4.0)
#>  evaluate      1.0.1   2024-10-10 [1] RSPM (R 4.4.0)
#>  fansi         1.0.6   2023-12-08 [1] RSPM (R 4.4.0)
#>  fastmap       1.2.0   2024-05-15 [1] RSPM (R 4.4.0)
#>  fs            1.6.5   2024-10-30 [1] RSPM (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] RSPM (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] RSPM (R 4.4.0)
#>  hms           1.1.3   2023-03-21 [1] RSPM (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#>  knitr         1.48    2024-07-07 [1] RSPM (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.4.0)
#>  pillar        1.9.0   2023-03-22 [1] RSPM (R 4.4.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.4.0)
#>  purrr       * 1.0.2   2023-08-10 [1] RSPM (R 4.4.0)
#>  R6            2.5.1   2021-08-19 [1] RSPM (R 4.4.0)
#>  readr       * 2.1.5   2024-01-10 [1] RSPM (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] RSPM (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] RSPM (R 4.4.0)
#>  rmarkdown     2.29    2024-11-04 [1] RSPM (R 4.4.1)
#>  rstudioapi    0.17.1  2024-10-22 [1] RSPM (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.4.0)
#>  tibble        3.2.1   2023-03-20 [1] RSPM (R 4.4.0)
#>  tidyselect    1.2.1   2024-03-11 [1] RSPM (R 4.4.0)
#>  tzdb          0.4.0   2023-05-12 [1] RSPM (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [1] RSPM (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.4.0)
#>  vroom         1.6.5   2023-12-05 [1] RSPM (R 4.4.0)
#>  withr         3.0.2   2024-10-28 [1] RSPM (R 4.4.0)
#>  xfun          0.49    2024-10-31 [1] RSPM (R 4.4.0)
#>  yaml          2.3.10  2024-07-26 [1] RSPM (R 4.4.0)
#> 
#>  [1] /Users/timothychisamore/Library/R/x86_64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

All the best,

Tim

2 Likes