Creating a time period of top values from a python dataframe

Creating a time period of top values from a python dataframe

Problem Description:

I have this dataframe:

DateNameSum
2022-12-01James10
2022-12-01Tom4
2022-12-03James5
2022-12-04Adam8

where i want to group the top names by their sum during a 7 day period (every week from Monday to Sunday)

Expected output:

enter image description here

Solution – 1

To group the names in your dataframe by the sum during a 7 day period, you can first create a new column that contains the week number for each row, using the isocalendar method from the datetime module. The isocalendar method returns a tuple containing the year, week number, and day of the week for a given date. You can use the week number from this tuple as a grouping key for your dataframe.

import pandas as pd
from datetime import datetime

# create a dataframe with the sample data
df = pd.DataFrame({'Date': ['2022-12-01', '2022-12-01', '2022-12-03', '2022-12-04'],
                   'Name': ['James', 'Tom', 'James', 'Adam'],
                   'Sum': [10, 4, 5, 8]})

# convert the Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# create a new column with the week number for each row
df['Week'] = df['Date'].apply(lambda x: x.isocalendar()[1])

# group the dataframe by the Week and Name columns
grouped = df.groupby(['Week', 'Name'])

# calculate the sum for each group and sort the resulting dataframe by the Sum column
result = grouped['Sum'].sum().sort_values(ascending=False)

the result variable will contain a dataframe with the names grouped by the sum during a 7 day period. You can access the top names by calling the head method on the dataframe, like this:

top_names = result.head()

This will return the top names, sorted by their sum during the 7 day period.

Solution – 2

You could try the following (df your dataframe, and column Date with datatimes):

cal = df["Date"].dt.isocalendar()
result = (
    df
    .drop(columns="Date")
    .assign(Week=cal.year.astype("str") + "-" + cal.week.astype("str").str.rjust(2, "0"))
    .groupby(["Week", "Name"], as_index=False).sum()
    .sort_values("Sum", ascending=False)
    .assign(Top=lambda df: df.groupby("Week").transform("cumcount") + 1)
    .pivot(index="Top", columns="Week")
    .reorder_levels([1, 0], axis=1).sort_index(level=0, axis=1)
)

The first part is similiar to what @AltunE is proposing, but their answer is missing the second part.

Result for a sample dataframe

from random import choices
from string import ascii_uppercase as names

days = pd.date_range("2022-01-01", "2022-12-31", freq="d")
df = pd.DataFrame({
    "Date": sorted(choices(days, k=1_000)),
    "Name": choices(names, k=1_000),
    "Sum": choices(range(101), k=1_000)
})

looks like:

Week 2021-52       2022-01        2022-02  ... 2022-50 2022-51        2022-52       
        Name   Sum    Name    Sum    Name  ...     Sum    Name    Sum    Name    Sum
Top                                        ...                                      
1          L  61.0       U  166.0       T  ...   159.0       O  174.0       X  208.0
2          D  45.0       D   63.0       U  ...   157.0       E  124.0       E  146.0
3          S  43.0       K   61.0       M  ...   116.0       V  108.0       W   89.0
4          T  41.0       W   51.0       E  ...    94.0       C   97.0       Y   82.0
5          Z  35.0       B   45.0       Y  ...    80.0       J   83.0       D   53.0
6        NaN   NaN       V   43.0       H  ...    63.0       F   72.0       U   52.0
7        NaN   NaN       A   35.0       D  ...    51.0       Q   70.0       S   27.0
8        NaN   NaN       C   34.0       S  ...    48.0       S   67.0       A   26.0
9        NaN   NaN       H   19.0       W  ...    43.0       W   64.0       B    5.0
10       NaN   NaN       T    9.0       G  ...    40.0       R   55.0     NaN    NaN
11       NaN   NaN       R    0.0       F  ...    19.0       D   43.0     NaN    NaN
12       NaN   NaN     NaN    NaN       L  ...    18.0       Z   40.0     NaN    NaN
13       NaN   NaN     NaN    NaN     NaN  ...     6.0       N    1.0     NaN    NaN
14       NaN   NaN     NaN    NaN     NaN  ...     5.0     NaN    NaN     NaN    NaN
15       NaN   NaN     NaN    NaN     NaN  ...     NaN     NaN    NaN     NaN    NaN
16       NaN   NaN     NaN    NaN     NaN  ...     NaN     NaN    NaN     NaN    NaN
17       NaN   NaN     NaN    NaN     NaN  ...     NaN     NaN    NaN     NaN    NaN
18       NaN   NaN     NaN    NaN     NaN  ...     NaN     NaN    NaN     NaN    NaN
19       NaN   NaN     NaN    NaN     NaN  ...     NaN     NaN    NaN     NaN    NaN
20       NaN   NaN     NaN    NaN     NaN  ...     NaN     NaN    NaN     NaN    NaN

If you want the Week labels to be more like you’ve shown you could do something like this

start = df["Date"] - pd.to_timedelta(df["Date"].dt.weekday, unit="D")
end = start + pd.Timedelta(days=7)
week = "Period " + start.dt.strftime("%Y/%m/%d") + " to " + end.dt.strftime("%Y/%m/%d")

and then adjust the code above with

    ...
    .assign(Week=week)
    ...
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject