Using substring of file name to create new variable in a list of dataframes

Using substring of file name to create new variable in a list of dataframes

Problem Description:

I have a directory with a set of .rds files containing dataframes:

files <- c("file_2022-11-30.rds", "file_2022-12-01.rds")

I want to read each file into a list and then assign a new column to each dataframe in the list that contains a piece of name of the file it was loaded in from (the date). I know how to do this with a for loop, but I’m looking for a concise solution. I’m sure there’s a way to do this with lapply, but this doesn’t work:

library(dplyr)

df_list <- lapply(files, readRDS) %>%
  lapply(FUN = function(x) mutate(date = as.Date(stringr::str_sub(files[x], start = -14, end = -5)))) %>%
bind_rows()

Desired output would look something like this:

   var1       date
1     1 2022-11-30
2     2 2022-11-30
3     2 2022-11-30
4     1 2022-11-30
5     2 2022-11-30
6     2 2022-12-01
7     1 2022-12-01
8     2 2022-12-01
9     1 2022-12-01
10    2 2022-12-01

Solution – 1

We may use as.Date on the files and convert it to Date class. Then loop over the files, read with readRDS, cbind the ‘dates’ in Map and rbind the list elements

dates <-  as.Date(files, format = "file_%Y-%m-%d.rds")
do.call(rbind, Map(cbind, lapply(files, readRDS), dates = dates))

Or if we want to use tidyverse

library(purrr)
library(dplyr)
map2_dfr(files, dates, ~ readRDS(.x) %>%
          mutate(dates = .y))

In the OP’s code, the files[x] wouldn’t work because x is not an index, it is the list element i.e. the output from readRDS and there is no information about the files in the x. Instead, we can do this once within the single lapply

lapply(files, function(x)      
   readRDS(x) %>%
    mutate(date = as.Date(stringr::str_sub(x, start = -14, end = -5)))) %>%
   bind_rows
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject