## Most efficient way to convert numpy array to dict

Problem Description:

I have 2 numpy arrays:

```
import numpy as np
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
```

I need to create a list of dicts:

```
res =
[{"a": 1, "b": 10},
{"a": 2, "b": 20},
{"a": 3, "b": 30}]
```

in most optimal way, without iterating through the whole array.

The obvious solution

```
res = [{"a": a_el, "b": b_el} for a_el, b_el in zip(a, b)]
```

takes too much time if a and b has a lot of values inside

## Solution – 1

If you’re open to also importing pandas, you could do:

```
import pandas as pd
df = pd.DataFrame({"a": a, "b": b})
res = df.to_dict(orient='records')
```

which gives the desired `res`

:

```
[{'a': 1, 'b': 10}, {'a': 2, 'b': 20}, {'a': 3, 'b': 30}]
```

~~Depending on the size of your arrays, this might not be worth it.~~ It appears this isn’t worth it regardless of the size of your arrays, but I’m going to keep this answer for its educational value, and will update it to compare the runtimes of methods that other people suggest.

Timing both approaches, my computer shows the `zip`

approach is always faster than the pandas approach, so disregard the previous part of this answer.

Ranking (fastest to slowest)

- Plain old
`zip`

- 0x0fba’s
`np.col_stack`

- My approach — create a dataframe and
`df.to_dict`

= crashMOGWAI’s numba approach (time for first function call is skewed by compilation time)

```
import timeit
from numba import jit
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
def f_zip(a, b):
return [{"a": ai, "b": bi} for ai, bi in zip(a, b)]
def f_pd(a, b):
df = pd.DataFrame({"a": a, "b": b})
return df.to_dict(orient='records')
def f_col_stack(a, b):
return [{"a": a, "b": b} for a, b in np.column_stack((a,b))]
@jit
def f_numba(a, b):
return [{"a": a_el, "b": b_el} for a_el, b_el in zip(a, b)]
funcs = [f_zip, f_pd, f_col_stack, f_numba]
sizes = [5, 10, 50, 100, 500, 1000, 5000, 10_000, 50_000, 100_000]
times = np.zeros((len(sizes), len(funcs)))
N = 20
for i, s in enumerate(sizes):
a = np.random.random((s,))
b = np.random.random((s,))
for j, f in enumerate(funcs):
times[i, j] = timeit.timeit("f(a, b)", globals=globals(), number=N) / N
print(".", end="")
print(s)
fig, ax = plt.subplots()
for j, f in enumerate(funcs):
ax.plot(sizes, times[:, j], label=f.__name__)
ax.set_xlabel("Array size")
ax.set_ylabel("Time per function call (s)")
ax.set_xscale("log")
ax.set_yscale("log")
ax.legend()
ax.grid()
fig.tight_layout()
```

## Solution – 2

I would say your current approach is fairly efficient. Not knowing any other details, you may be able to precompile w/ numba and save a little execution time. Making some order of magnitude and memory availability assumptions, see below Jupyter cells.

```
# %%
import numpy as np
from numba import jit
# %%
x = np.array(range(1,1000000,1))
y = np.array(range(10,1000000,10))
test = [{"a": a_el, "b": b_el} for a_el, b_el in zip(x, y)]
# %%
@jit
def f():
a = np.array(range(1,1000000,1))
b = np.array(range(10,1000000,10))
return [{"a": a_el, "b": b_el} for a_el, b_el in zip(a, b)]
```

## Solution – 3

You can try this one-liner that uses a list-comprehension to build the list of dicts, as well as the numpy `column_stack()`

method.

```
res = [{"a": a, "b": b} for a, b in np.column_stack((a,b))]
```