Group datetime column based on the time difference

Group datetime column based on the time difference

Problem Description:

Name        Time
John        2022-12-06 02:37:25
John        2022-12-06 10:37:25
John        2022-12-06 10:42:34            
John        2022-12-06 10:47:12
John        2022-12-06 10:52:10

Above data shows datetime of john clicking a remote button. So if he is clicking the remote every 10mins I want to consider that “John” has been watching TV continuously as if in a single session.

So in above data set row 0 and row 1 time difference is in hours(that is greater than 10mins) . So I want to consider this as a separate session.

When you see the time difference between row1, row2 and row2,row3 and row3,row4 it is less than 10minutes. So I want to consider all these rows as a single session.

My desired output is

Name     Aggregated_timelist
John     [‘2022-12-06 10:37:25’,  2022-12-06 10:42:34’, ‘2022-12-06 10:47:12’, ‘2022-12-06 10:52:10’]
John     [‘2022-12-06 02:37:25’]

Solution – 1

You can use a custom groupby.agg as list setting a new group when the difference is greater than 10 minutes:

group = pd.to_datetime(df['Time']).diff().gt('10min').cumsum()
# if you have several users:
# group = pd.to_datetime(df['Time']).groupby(df['Name']).diff().gt('10min').cumsum()

out = df.groupby(['Name', group], as_index=False)['Time'].agg(list)

Output:

   Name                                               Time
0  John                              [2022-12-06 02:37:25]
1  John  [2022-12-06 10:37:25, 2022-12-06 10:42:34, 202...
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject