How to filter fows in Pandas using partial string in column

How to filter fows in Pandas using partial string in column

Problem Description:

I’ve used the SerpAPI to pull down some data about jobs in a sector I want to return to.

There is a lot of junk about training and I’d like to remove the results based on the displayed_link column.

position    title   link    displayed_link  date    snippet snippet_highlighted_words   sitelinks   about_this_result   about_page_link about_page_serpapi_link cached_page_link    related_questions   rich_snippet    related_pages_link  thumbnail   duration    key_moments
0   1   What Does a Data Analyst Do? Your 2022 Career ...   https://www.coursera.org/articles/what-does-a-...   https://www.coursera.org › Coursera Articles ›...   Nov 14, 2022    A data analyst is a person whose job is to gat...   [data analyst]  {'inline': [{'title': 'Business analyst', 'lin...   {'source': {'description': 'Coursera Inc. is a...   https://www.google.com/search?q=About+https://...   https://serpapi.com/search.json?engine=google_...   https://webcache.googleusercontent.com/search?...   NaN NaN NaN NaN NaN NaN
1   2   What Does a Data Analyst Do? Exploring the Day...   https://www.rasmussen.edu/degrees/technology/b...   https://www.rasmussen.edu › degrees › technolo...   Sep 19, 2022    Generally speaking, a data analyst will retrie...   [data analyst, Data analysts]   {'inline': [{'title': 'Where Do Data Analysts ...   {'source': {'description': 'Rasmussen Universi...   https://www.google.com/search?q=About+https://...   https://serpapi.com/search.json?engine=google_...   https://webcache.googleusercontent.com/search?...   NaN NaN NaN NaN NaN NaN
2   3   Become a Data Analyst Learning Path - LinkedIn  https://www.linkedin.com/learning/paths/become...   https://www.linkedin.com › learning › become-a...   NaN Data analysts examine information using data a...   [Data analysts, data analysis]  NaN {'source': {'description': 'LinkedIn is an Ame...   https://www.google.com/search?q=About+https://...   https://serpapi.com/search.json?engine=google_...   NaN NaN NaN NaN NaN NaN NaN
3   4   What Does a Data Analyst Do? - SNHU https://www.snhu.edu/about-us/newsroom/stem/wh...   https://www.snhu.edu › about-us › newsroom › stem   

Tried manually creating of the sites I want to exclude sites in this list

promotions = ["coursera"
,"rasmussen"
,"snhu"
,"mastersindatascience"
,"northeastern"
,"mygreatlearning"
,"payscale.com"
,"careerfoundry"
,"microsoft.com"
,"codecademy"
,"edx.org"
,"ahima.org"
,"›certification-exams›chda'"]

Tried this:

df['displayed_link'].map(lambda x: "T" if x in promotions else "F")

And all it does is return F – I’m guessing because it needs exact string.

df['displayed_link'].map(lambda x: "T" if promotions in x else "F")

I tried it the other way, but that was a syntax error.

What is the most efficient way of filtering rows based on a column based on a list of manually curated strings?

    enter code here

Solution – 1

Use Series.str.contains with chain list by | for regex OR:

df['test1'] = np.where(df['displayed_link'].str.contains('|'.join(promotions)), 'T', 'F')
df['test2'] = (df['displayed_link'].str.contains('|'.join(promotions))
                                   .map({True:'T',False: 'F'}))

If necessary, use words boundaries bb:

pat = '|'.join(rf"b{x}b" for x in promotions))
df['test3']= np.where(df['displayed_link'].str.contains(pat), 'T', 'F')
df['test4']= df['displayed_link'].str.contains(pat).map({True:'T',False: 'F'})
print (df)
                                  displayed_link test1 test2 test3 test4
0  https://www.coursera.org/articles/what-does-a     T     T     T     T
1  https://www.rasmussen.edu/degrees/technology/     T     T     T     T
2       https://www.linkedin.com/learning/paths/     F     F     F     F
3  https://www.snhu1.edu/about-us/newsroom/stem/     T     T     F     F
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject