# Find the cluster number corresponding to cluster labels

## Find the cluster number corresponding to cluster labels

Contents

Problem Description:

Based on the result of a clustering I have the following data:

``````    timestamp  consumption  cluster_number
0           0    35.666667               1
1           1    29.352222               1
2           2    24.430000               1
3           3    21.756667               1
4           4    20.345556               1
5           5    19.763333               1
6           6    19.874444               1
7           7    22.078889               1
8           8    28.608889               1
9           9    33.827778               2
10         10    36.414444               2
11         11    38.340000               2
12         12    43.305556               2
13         13    43.034444               2
14         14    39.076667               2
15         15    36.378889               2
16         16    36.171111               2
17         17    40.381111               2
18         18    48.692222               0
19         19    52.330000               0
20         20    50.154444               0
21         21    46.491111               0
22         22    44.014444               0
23         23    40.628889               0
``````

With this clustering, the maximum value (and values close to the maximum value) of the column `consumption` is in `cluster_number` 0, the minimum value (and values close to the minimum value) of the column `consumption` is in `cluster_number` 1 and the rest in `cluster_number` 2. However, I cannot know beforehand which ‘consumption’ values correspond to which `cluster_number`, so I need to find a way to first connect the `cluster_number` with high, low and middle class and then come up with a list of the column `timestamp` for each cluster.

Specifically, I want to come up with three lists:

1. high = [18, 19, 20, 21, 22, 23]
2. low = [0, 1, 2, 3, 4, 5, 6, 7, 8]
3. middle = [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Any idea of how can I achieve this?

## Solution – 1

You can use a `groupby.min` + `rank` to identify the order of the clusters, then `groupby.agg` on the index and rename using the cluster order:

``````order = ['low', 'middle', 'high']

g = df.reset_index().groupby('cluster_number')
mapper = (g['consumption'].min() # min for the demo, you can use any function mean/sum/…
.rank(method='dense')
.map(dict(enumerate(order, start=1)))
)
out = g['index'].agg(list).rename(mapper)
``````

Output:

``````cluster_number
high                 [18, 19, 20, 21, 22, 23]
low               [0, 1, 2, 3, 4, 5, 6, 7, 8]
middle    [9, 10, 11, 12, 13, 14, 15, 16, 17]
Name: index, dtype: object
``````
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.