Transform Pandas column to get a key value pair in a column post group by

Transform Pandas column to get a key value pair in a column post group by

Problem Description:

My DataFrame:

Col X    Col Y    ID     Value
 A         a      'r'      3
 A         a      'b'      2
 A         a      'c'      1
 B         b      'd'      5
 B         b      's'      6
 B         b      'd'      7  

Output required:

Col X    Col Y    Out
 A         a      {'r':3, 'b':2, 'c':1}
 B         b      {'d': 5, 's': 6, 'd':7}

Approach tried so far:

df = df.set_index(['Col X', 'Col Y', 'ID']).Value
dict_column = {k: df.xs((k, v)).to_dict() for k,v,v2 in df.index}

Solution – 1

Use GroupBy.apply with lambda function:

df['ID'] = df['ID'].str.strip("'")

df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']]
        .apply(lambda x: dict(x.to_numpy()))
        .reset_index(name='Out'))
print (df1)
  Col X Col Y                       Out
0     A     a  {'r': 3, 'b': 2, 'c': 1}
1     B     b          {'d': 7, 's': 6}

Duplicated keys not exist in python dictionary. You can aggregate values, e.g. by sum:

df['ID'] = df['ID'].str.strip("'")

df = df.groupby(['Col X', 'Col Y','ID'], as_index=False)['Value'].sum()
print (df)
  Col X Col Y ID  Value
0     A     a  b      2
1     A     a  c      1
2     A     a  r      3
3     B     b  d     12
4     B     b  s      6

df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']]
        .apply(lambda x: dict(x.to_numpy()))
        .reset_index(name='Out'))
print (df1)
  Col X Col Y                       Out
0     A     a  {'b': 2, 'c': 1, 'r': 3}
1     B     b         {'d': 12, 's': 6}

Solution – 2

You can create pd.Series inside apply and use to_dict:

output = ( df.groupby(['Col X', 'Col Y'])[['ID', 'Value']].
             apply(lambda x: pd.Series(x['Value'].values,index=x['ID']).to_dict()) )

Solution – 3

You can use groupby.apply with dict and zip:

(df.groupby(['Col X', 'Col Y'])
   .apply(lambda x: dict(zip(x['ID'], x['Value'])))
   .reset_index(name='Out')
 )

Output:

  Col X Col Y                       Out
0     A     a  {'r': 3, 'b': 2, 'c': 1}
1     B     b          {'d': 7, 's': 6}

If you want to aggregate the duplicated keys:

(df.groupby(['Col X', 'Col Y'])
   .apply(lambda x: x['Value'].groupby(x['ID']).sum().to_dict())
   .reset_index(name='Out')
 )

Output:

  Col X Col Y                       Out
0     A     a  {'b': 2, 'c': 1, 'r': 3}
1     B     b         {'d': 12, 's': 6}
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject