# How can I label a column of strings into numbered groups based on another column containing substrings?

## How can I label a column of strings into numbered groups based on another column containing substrings?

Contents

Problem Description:

I have the 1st column that is around 4920 different chemical compounds.

For example:

``````0              Ag(AuS)2
1            Ag(W3Br7)2
2      Ag0.5Ge1Pb1.75S4
3     Ag0.5Ge1Pb1.75Se4
4                Ag2BBr
...                 ...
4916             ZrTaN3
4917               ZrTe
4918             ZrTi2O
4919             ZrTiF6
4920               ZrW2
``````

I have the 2nd column that has all the elements of the periodic table numerically listed atomic number

``````0      H
1     He
2     Li
3     Be
4      B
..   ...
113   Fl
114  Uup
115   Lv
116  Uus
117  Uuo
``````

How can I classify the first column into groups based on the compound’s first element corresponding to their atomic number from column 2 so that I can return the first column

The atomic number of Ag = 27
The atomic number of Zr = 40

``````    0            47
1            47
2            47
3            47
4            47
...                 ...
4916         40
4917         40
4918         40
4919         40
4920         40
``````

## Solution – 1

Since the first element could be a varying number of letters, the simplest solution would be to use the regex approach for getting the first section.
For example:

``````import re

compounds = ["Ag(AuS)2", "HTiF", "ZrTaN3"]

for compound in compounds:
match = re.match(r"[A-Z][a-z]*", compound)
if match:
fist_element = match.group(0)
print(fist_element)
``````

this will print out the first element of each compound.
Note: If there are some more complex compounds and you need to adjust your regex, I recommend using https://regex101.com/ as a playground.

Once you have that information it just needs to be connected with the element in the second column which would be easiest if you mapped that column to a dictionary resembling:

``````{ H: 0, He: 1, Li: 2 ...}
``````

which would allow you to simply get the element index by calling `dict_with_elements.get(first_element)`.

From there on, the rest is just looping and writing data. I hope this helps.

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.