Append matrices generated in for loop that have different col names to an external list of empty dataframes

Append matrices generated in for loop that have different col names to an external list of empty dataframes

Problem Description:

I have a large dataset that I am trying to perform various analyses on, but first need to transform into matrices grouped by different variables.

For example, here is a toy dataset:

myData = pd.DataFrame({'dataset': ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'bird', 'bird', 'bird', 'bird'], 
                  'category_1': ['orange', 'orange', 'white', 'white', 'black', 'brown', 'brown', 'black', 'red', 'green', 'red', 'green'], 
                  'category_2': ['this_cat', 'that_cat', 'this_cat', 'that_cat', 'this_dog', 'that_dog', 'this_dog', 'that_dog', 'this_bird', 'that_bird', 'this_bird', 'that_bird'],
                  'values': ['1', '8', '9', '2', '5', '4', '3', '10', '0', '2', '7', '9']
                 })

for i, animals in myData.groupby('dataset'):
    tuples = animals.groupby(['category_1', 'category_2'])['values'].mean().reset_index()
    tuples = pd.DataFrame(tuples)
    matrix = tuples.pivot(index='category_2', columns='category_1', values='values').reset_index()
    display(matrix)

Here I am grouping my data by "animals" and converting each group into a matrix. However, because the column names are not same across my matrices, I am having trouble saving my output into an external empty list or dataframe.

For example, I’d like to save each matrix into a separate dataframe that is dynamically generated depending on the number of groups in my data:

output_dfs = {k: pd.DataFrame([]) for k in myData['dataset']}

Desired output in this case would be 3 separate dataframes that I can access by a name: (the values are based on the toy dataset)

dataset category_1 category_2 green red
bird    0          that_bird  14.5  NaN
bird    1          this_bird  NaN   3.5

dataset category_1 category_2 orange white
cat     0          that_cat   8.0    2.0
cat     1          this_cat   1.0    9.0

dataset category_1 category_2 black brown
dog     0          that_dog   10.0  4.0
dog     1          this_dog   5.0   3.0

Solution – 1

I’m not sure what you mean, is this the result you want to achieve?

myData = pd.DataFrame({'dataset': ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'bird', 'bird', 'bird', 'bird'], 
                  'category_1': ['orange', 'orange', 'white', 'white', 'black', 'brown', 'brown', 'black', 'red', 'green', 'red', 'green'], 
                  'category_2': ['this_cat', 'that_cat', 'this_cat', 'that_cat', 'this_dog', 'that_dog', 'this_dog', 'that_dog', 'this_bird', 'that_bird', 'this_bird', 'that_bird'],
                  'values': ['1', '8', '9', '2', '5', '4', '3', '10', '0', '2', '7', '9']
                 })

result = {}
for i, animals in myData.groupby('dataset'):
    tuples = animals.groupby(['category_1', 'category_2'])['values'].mean().reset_index()
    tuples = pd.DataFrame(tuples)
    matrix = tuples.pivot(index='category_2', columns='category_1', values='values').reset_index()
    result[i] = matrix
display(result)
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject