Keep the corresponding value of another column after using groupby in a column

Keep the corresponding value of another column after using groupby in a column

Problem Description:

After using groupby and agg to find the max and the min, I would like to keep the value of another column corresponding to said Max and Min.

What I have:

OrderMissingDate
Order 1Missing 12002
Order 1Missing 22003
Order 2Missing 32004
Order 2Missing 42005
Order 2Missing 52006
Order 3Missing 62007

What I do

calculation = df.groupby(by=('Order')).agg(Max=('Date','max'),Min=('Date','min')).reset_index()
calculation['difference'] = calculation['Max']-calculation['Min']

What I get

OrderMinMaxdifference
Order 1200220031 year
Order 2200420062 years
Order 3200720070s

What I want,

OrderMinMaxdifferenceMissing MinMissing Max
Order 1200220031 yearMissing 1Missing 2
Order 2200420062 yearsMissing 3Missing 5
Order 3200720070sMissing 6Missing 6

Solution – 1

I would use separate groupby and a merge:

g = df.groupby(by='Order')

(pd.merge(df.loc[g['Date'].idxmin()]
            .rename({'Date': 'Min', 'Missing': 'Missing Min'}, axis=1),
          df.loc[g['Date'].idxmax()]
            .rename({'Date': 'Max', 'Missing': 'Missing Max'}, axis=1),
          on='Order')
   .assign(difference=lambda d: d['Max']-d['Min'])
)

Output:

     Order Missing Min   Min Missing Max   Max  difference
0  Order 1   Missing 1  2002   Missing 2  2003           1
1  Order 2   Missing 3  2004   Missing 5  2006           2
2  Order 3   Missing 6  2007   Missing 6  2007           0

Solution – 2

maybe try to add additional group by calculations/columns, similiar to how you created "Max" and "Min" columns from "Date". (I apologize, I am not able to debug code right now).:

calculation = df.groupby(by=('Order')).agg(Max=('Date','max'),Min=('Date','min'), Missing_Min=('Missing','min'), Missing_Max=('Missing','max')).reset_index()
calculation['difference'] = calculation['Max']-calculation['Min']

Solution – 3

You need a merge. Two more lines in your code would get you what you want.

calculation["Missing Min"] = calculation.merge(
    df, left_on=["Order", "Min"], right_on=["Order", "Date"]
)["Missing"]
calculation["Missing Max"] = calculation.merge(
    df, left_on=["Order", "Max"], right_on=["Order", "Date"]
)["Missing"]

print(calculation):

             Order   Max   Min  difference Missing Min Missing Max
0          Order 1   2003  2002           1  Missing 1   Missing 2 
1          Order 2   2006  2004           2  Missing 3   Missing 5 
2          Order 3   2007  2007           0  Missing 6   Missing 6 
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject