How to get data from another dataframe based on key values

How to get data from another dataframe based on key values

Problem Description:

I have two dataframes

df1

Area-ID-SeriesArea-IDMODE
BSD-1-1800BSD-1VM
BSD-2-1800BSD-2VM
BSD-3-1800BSD-3VM
BSE-1-2100BSE-1XM
BSE-1-900BSE-1VM
BSE-2-2100BSE-2XM
BSE-2-900BSE-2VM
BSE-3-2100BSE-3XM
BSE-3-900BSE-3VM
SMR-1-1800SMR-1VM

df2

Area-ID90018002100
BSD-1
BSD-2
BSD-3
BSE-1
BSE-2
BSE-3
SMR-1

df2 has a key based on df1 but without the "Series", because the "Series" transformed into columns.

I want to get MODE value for each series based on Area-ID, so my desired output is like this

df2

Area-ID90018002100
BSD-1VM
BSD-2VM
BSD-3VM
BSE-1VMXM
BSE-2VMXM
BSE-3VMXM
SMR-1VM

I have tried this code but still no idea how it should be done

df_result = pd.merge(df2,
                    df1[['Area-ID', 'MODE']],
                    on= 'Area-ID',
                    how='left')

I have tried to do this in excel with df1 as "city" the formula look like this

900 series column

=IFNA(VLOOKUP(A2&"-"&$B$1,city!A:C,3,0),"-") 

1800 series column

=IFNA(VLOOKUP(A2&"-"&$C$1,city!A:C,3,0),"-") 

2100 series column

=IFNA(VLOOKUP(A2&"-"&$D$1,city!A:C,3,0),"-") 

it worked, but it takes a lot of time because my data is huge so I try to do it with python instead

Solution – 1

I don’t know merge very well. Perhaps there is a way to do based on it. Did it differently.

A ‘number’ column is created. To do this, the rows of the Area-ID-Series column of dataframe df1 are split() into an array by delimiter using split and expand=True is applied to get the columns. The second column is retrieved.

Further, the dataframe is grouped by df1.index, that is, it is called on each row, where the expression df2['Area-ID'] == x['Area-ID'].values[0] is a mask for indexing rows, and x[' number'].values[0] acts as a column name for df2 filtering. Explicit loc indexing is used, where the row indexes are on the left, the column name is on the right.
.values[0] is used to extract the value from the list.

import pandas as pd

df1 = pd.read_csv('df1.csv', header=0)
df2 = pd.read_csv('df2.csv', header=0)

df1['number'] = df1['Area-ID-Series'].str.split('-', expand=True)[2].astype(str)


def my_func(x):
    df2.loc[df2['Area-ID'] == x['Area-ID'].values[0], x['number'].values[0]] = x['MODE'].values[0]


df1.groupby(df1.index).apply(my_func)

print(df2)

Output

  Area-ID  900 1800 2100
0   BSD-1  NaN   VM  NaN
1   BSD-2  NaN   VM  NaN
2   BSD-3  NaN   VM  NaN
3   BSE-1   VM  NaN   XM
4   BSE-2   VM  NaN   XM
5   BSE-3   VM  NaN   XM
6   SMR-1  NaN   VM  NaN
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject