getting new dataframe from existing in dataframe with conditions on multiple columns

getting new dataframe from existing in dataframe with conditions on multiple columns

Problem Description:

I am trying to sort a pandas dataframe. The data looks like-

yearstatedistrictPartyrankshare in votes
2010haryanakaithalWinner140.12
2010haryanakaithalbjp230.52
2010haryanakaithalNOTA329
2010goapanjiWinner310
2010goapanjiINC240
2010goapanjiBJP150
2013upmeerutWinner240
2013upmeerutSP160
2015haryanakaithalWinner215
2015haryanakaithalBJP335
2015haryanakaithalINC150

This data is for multiple states for multiple years.
In this dataset, there are multiple values for each district. I want to calculate the margin of share for each district in this manner. I have tried this, but not able to write fully. I am not able to write code for defining the margin of share and get a dataframe with only one (margin of share) value corresponding to each district instead of party wise shares.

for year in df['YEAR']:
 for state in df['STATE']:
    for district in df['DISTRICT']:
        for rank in df['RANK']:
            for party in df['PARTY']:
                if rank==1 and party=='WINNER': 

then margin of share =Share of Winner-Share of party at rank 2. If share WINNER does not have rank 1 then Margin of Share= share of winner – share of party at rank 1.

I am basically trying to get this output-

|      year     |     state   |district| margin of share|
|---------------|-------------|--------|----------------|
|          2010    | haryana  |kaithal | 9.6            |
|          2010    | goa      |panji   | -40            |
|          2010    | up       |kaithal | -20            |
|          2015    | haryana  |kaithal | -35            |

I wish to have create a different data frame with columns Year, State, District and margin of SHARE.

Solution – 1

use groupby and where with conditions

g = df.groupby(['year', 'state', 'district'])
cond1 = df['Party'].eq('Winner')
cond2 = df['rank'].eq(1)
cond3 = df['rank'].eq(2)
df1 = g['share in votes'].agg(lambda x: (x.where(cond1).sum() - x.where(cond3).sum()) if x.where(cond1 & cond2).sum() != 0 else (x.where(cond1).sum() - x.where(cond2).sum())).reset_index()

result(df1):

    year    state   district    share in votes
0   2010    goa     panji       -40.0
1   2010    haryana kaithal     9.6
2   2013    up      meerut      -20.0
3   2015    haryana kaithal     -35.0

if you want sort like df use following code:

df.iloc[:, :3].drop_duplicates().merge(df1)

result:

    year    state   district    share in votes
0   2010    haryana kaithal     9.6
1   2010    goa     panji       -40.0
2   2013    up      meerut      -20.0
3   2015    haryana kaithal     -35.0

Solution – 2

Create MultiIndex by first 3 columns by DataFrame.set_index, create masks, filter with DataFrame.loc and subtract values, last use Series.fillna for replace not matched values by condition m3:

df1 = df.set_index(['year', 'state', 'district'])
m1 = df1.Party=='Winner'
m2 = df1['rank']==2
m3 = df1['rank']==1

s1 = (df1.loc[m1 & m3,'share in votes']
        .sub(df1.loc[m2,'share in votes']))
print (s1)
year  state    district
2010  goa      panji       NaN
      haryana  kaithal     9.6
2013  up       meerut      NaN
2015  haryana  kaithal     NaN
Name: share in votes, dtype: float64

s2 = (df1.loc[m1,'share in votes']
        .sub(df1.loc[m3,'share in votes']))
print (s2)
year  state    district
2010  haryana  kaithal      0.0
      goa      panji      -40.0
2013  up       meerut     -20.0
2015  haryana  kaithal    -35.0
Name: share in votes, dtype: float64

df = s1.fillna(s2).reset_index()
print (df)
   year    state district  share in votes
0  2010      goa    panji           -40.0
1  2010  haryana  kaithal             9.6
2  2013       up   meerut           -20.0
3  2015  haryana  kaithal           -35.0
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject