How to reset a sequence in a R dataframe's column, based on the levels of a different column

How to reset a sequence in a R dataframe's column, based on the levels of a different column

Problem Description:

I have data that logs subject and session information in columns like so:

> Subject <- c(rep("A", 6), rep("B", 7))
> Session <- c(rep(1, 3), rep(2, 3), rep(8, 4), rep(9, 3))
> df <- data.frame(Subject, Session)
> df
   Subject Session
1        A       1
2        A       1
3        A       1
4        A       2
5        A       2
6        A       2
7        B       8
8        B       8
9        B       8
10       B       8
11       B       9
12       B       9
13       B       9

The problem is that the raw data does not always have consistent session numbering (hence why subject B’s first session starts at session 8 and not 1). Ideally, it would be helpful to have a column that resets the session numbering when each new subject appears. Like this:

   Subject Session New_Sess
1        A       1        1
2        A       1        1
3        A       1        1
4        A       2        2
5        A       2        2
6        A       2        2
7        B       8        1
8        B       8        1
9        B       8        1
10       B       8        1
11       B       9        2
12       B       9        2
13       B       9        2

I’ve attempted a solution with nested for loops and could not manage to get that to work. I suspect there must be a straightforward solution to this (it doesn’t seem like it should be that difficult) but despite my best efforts, it has evaded me completely.

The actual dataset is quite large and cumbersome, so an automated solution to this problem would be helpful.

Solution – 1

We may use match

library(dplyr)
df <- df %>%
   group_by(Subject) %>% 
   mutate(New_Sess = match(Session, unique(Session))) %>%
   ungroup

-output

df
# A tibble: 13 × 3
   Subject Session New_Sess
   <chr>     <dbl>    <int>
 1 A             1        1
 2 A             1        1
 3 A             1        1
 4 A             2        2
 5 A             2        2
 6 A             2        2
 7 B             8        1
 8 B             8        1
 9 B             8        1
10 B             8        1
11 B             9        2
12 B             9        2
13 B             9        2

Or with factor

df <-  df %>%
     group_by(Subject) %>% 
     mutate(New_Sess = as.integer(factor(Session))) %>% 
     ungroup
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject