Assign values based on two tresholds in pandas

Assign values based on two tresholds in pandas

Problem Description:

I have a pandas Dataframe named df and it has a column named logvalues.
I want to create a new column, violatedInstances based on these log values.

If Max >= logvalue >= Min assign 0 (Not violated)
If logvalue > Max or logvalue < Min assign 1 (Violated)

#create DataFrame
df_x = pd.DataFrame({'logvalue': ['20', '20.5', '18.5', '2', '10'],
                     'ID': ['1', '2', '3', '4', '5']})


Max = 20
min = 15

Output should look like below.

logvalueIDviolatedInstances
2010
20.521
18.530
241
1051

Sorry for asking this simple question. I tried several methods but nothing worked.
How can I do this in pandas?

Solution – 1

Your logvalue type is string so you’ll have to convert to float:

df_x['violatedInstances'] = df_x['logvalue'].astype(float).apply(lambda x: 1 if (x > Max or x < Min) else 0)

Solution – 2

cond1 = pd.to_numeric(df_x['logvalue']).gt(20)
cond2 = pd.to_numeric(df_x['logvalue']).lt(15)
df_x.assign(violatedInstances= (cond1 | cond2).astype('int'))

result:

logvalue    ID  violatedInstances
0   20      1   0
1   20.5    2   1
2   18.5    3   0
3   2       4   1
4   10      5   1

Solution – 3

First I would convert logvalue to a float so you can perform comparisons

df_x['logvalue'] = df_x['logvalue'].astype('float')

Then you can use numpy as such:

import numpy as np
df_x['violatedInstances'] = np.where(((df_x['logvalue'] > Max) | (df_x['logvalue'] < Min)), 1, 0)

which outputs:
enter image description here

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject