How to reate a dataframe based on excel sheet name and cell position?

How to reate a dataframe based on excel sheet name and cell position?

Problem Description:

I have an excel table (sample.xlsx) which contains 3 sheets (‘Sheet1′,’Sheet2′,’Sheet3’).
Now I have read all the sheets and combine them into one dataframe.

import pandas as pd
data_df = pd.concat(pd.read_excel("sample.xlsx", header=None, index_col=None, sheet_name=None))

data_df looks like this:

              0      1      2
Sheet1 0   val1   val2   val3
       1  val11  val21  val31
Sheet2 0   val1   val2   val3
       1  val11  val21  val31
Sheet3 0   val1   val2   val3
       1  val11  val21  val31

Is there any way to create a new dataframe which has the same shape with data_df but each cell value is the cell position info?

I have tried to get multiple index:

multi_index = data_df.index.levels[:]

and I get:

[['Sheet1', 'Sheet2', 'Sheet3'], [0, 1]]

But I don’t know how to use these data to create a new dataframe like this:

             0            1            2
0  Sheet1 - A1  Sheet1 - B1  Sheet1 - C1
1  Sheet1 - A2  Sheet1 - B2  Sheet1 - C2
2  Sheet2 - A1  Sheet2 - B1  Sheet2 - C1
3  Sheet2 - A2  Sheet2 - B2  Sheet2 - C2
4  Sheet3 - A1  Sheet3 - B1  Sheet3 - C1
5  Sheet3 - A2  Sheet3 - B2  Sheet3 - C2

Thanks in advance!

Solution – 1

Look into the “openpyxl” package. I think you can do something like “for sheet in sheets” then read each sheet to your data frame like that. Hope this guides you to the right place.

Solution – 2

Since the values in your data_df don’t matter, you could build the cartesian product of index and columns and build a new dataframe of it.

mapping = dict(enumerate('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

mapping is needed to convert the column numbers 0,1,2,… to A,B,C,…
UPDATE
In case you have excel sheets with more than 26 columns, you would need to do some more calculations to get the Excel column names like AA, AB,...AZ and so on. In this post there are several answers which cover that issue.

import itertools
import numpy as np

out = (
    pd.DataFrame(
        np.array([f"{x[0][0]} - {mapping[int(x[1])]}{x[0][1]+1}" 
                  for x in itertools.product(data_df.index, data_df.columns)])
        .reshape(len(data_df), -1)
    )
)

print(out)
             0            1            2
0  Sheet1 - A1  Sheet1 - B1  Sheet1 - C1
1  Sheet1 - A2  Sheet1 - B2  Sheet1 - C2
2  Sheet2 - A1  Sheet2 - B1  Sheet2 - C1
3  Sheet2 - A2  Sheet2 - B2  Sheet2 - C2
4  Sheet3 - A1  Sheet3 - B1  Sheet3 - C1
5  Sheet3 - A2  Sheet3 - B2  Sheet3 - C2
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject