Python/Pandas – Check if multiple columns has any of three items in a list

Python/Pandas – Check if multiple columns has any of three items in a list

Problem Description:

I am creating a binary target variable that is 1 if any of these select columns (Current, Month1, Month2, Month3, Month4, Month5, Month6) has any number of these three strings (‘Item1’, ‘Item2’, ‘Item3’). If none of these columns have these items then the target variable for this row will be populated with 0. For example:

CurrentMonth1Month2Month3Month4Month5Month6Target
Item8Item7Item7Item8Item8Item8Item80
Item3Item4Item4Item4Item4Item4Item41
Item3Item4Item1Item2Item3Item3Item31

For the first row, none of the columns Current-Month6 contain either Item1, Item2, or Item3 so the Target is 0. For row 2, At least one of the key items (Item3) is in at least one of the select columns (Current) so the Target is 1. And finally, Item1, Item2, and Item3, are throughout the 6 columns so the target is 1.

I currently have this loop set up that looks at all the select columns in each row to see if they contain any of the three items. If so, the list is appends a 1 at the index, otherwise 0:

Target = []
for i in range(df.shape[0]):
    if (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item1')).any()) == True) or  (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item2')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item3')).any()) == True):
        Target.append(1)
    else:
        Target.append(0)

I was wondering if there was a faster/more efficient way to do this, or a way to include multiple items in the .eq() function?

Thanks!

Solution – 1

df = pd.read_clipboard()

# get str cols
str_cols = df.select_dtypes(include=["object", "string"])

# find any number of references
row_ = str_cols.apply(lambda row: row.str.contains(r'Item1|Item2|Item3', regex=True, case=False, na=False)).any(axis=1)

# assign to target
df['target'] = np.where(row_.where(lambda x: x).notna(), 1, 0)
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject