Multiprocessing in standalone mode

I am looking for a toy example to use multiprocssing with standalone C++ mode.
I found some questions in the group, but I could not found an example the run without error.
I pass an array of tau values to the run_sim function and used Pool.map for multiprocessing.

This is my try:

import os
import multiprocessing

from brian2 import *

set_device('cpp_standalone', build_on_run=False, directory=None)

def run_sim(tau):
    pid = os.getpid()
    print(f'RUNNING {pid}')
    
    G = NeuronGroup(1, 'dv/dt = -v/tau : 1', method='euler')
    G.v = 1

    mon = StateMonitor(G, 'v', record=0)
    net = Network()
    net.add(G, mon)
    net.run(100 * ms)
    device.build(
        directory='standalone{}'.format(
        os.getpid()),
        # directory=None,
        compile=True, run=True)
    device.reinit()
    
    print(f'FINISHED {pid}')
    return (mon.t/ms, mon.v[0])


if __name__ == "__main__":
    num_proc = 4

    tau_values = np.arange(10)*ms + 5*ms
    with multiprocessing.Pool(num_proc) as p:
        results = p.map(run_sim, tau_values)

    for tau_value, (t, v) in zip(tau_values, results):
        plt.plot(t, v, label=str(tau_value))
    plt.legend()
    plt.show()

error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "01_example.py", line 24, in run_sim
    compile=True, run=True)
  File "/home/abolfazl/.local/lib/python3.6/site-packages/brian2/devices/cpp_standalone/device.py", line 1105, in build
    raise RuntimeError('The network has already been built and run '
RuntimeError: The network has already been built and run before. To build several simulations in the same script, call "device.reinit()" and "device.activate()". Note that you will have to set build options (e.g. the directory) and defaultclock.dt again.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "01_example.py", line 36, in <module>
    results = p.map(run_sim, tau_values)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
RuntimeError: The network has already been built and run before. To build several simulations in the same script, call "device.reinit()" and "device.activate()". Note that you will have to set build options (e.g. the directory) and defaultclock.dt again.

Thank you in advance for any guide.

There are a few problems in your code:

  1. You need to move the set_device line into your run function. This line initialises the C++ Standalone Device that keeps track of wether your network has been build or run before. As it is right now, all processes work on the same C++ Standalone Device in parallel. The second process that tries to build the network throws your error as the first process already started running it (but hasn’t reached the device.reinit() line yet). By moving the set_device line into your parallelised function, you create one C++ Standalone Device per process.
  2. You don’t need the build_on_run=False and device.build calls since you have only one run call in your simulations.
  3. You are calling device.reinit() before you access the monitor values and by that resetting them before you use them. This will give you a NotImplementedError, saying that you can’t access variables in standalone mode before the network has been run. To fix that, first store the monitor values, then reinit the device. You still need to reinit the device if you are running multiple simulations per process (you have 10 tau values and num_proc = 4).

Here is a corrected version of your parallelised function:

def run_sim(tau):

    set_device('cpp_standalone', directory=None)
    pid = os.getpid()
    print(f'RUNNING {pid}')

    G = NeuronGroup(1, 'dv/dt = -v/tau : 1', method='euler')
    G.v = 1

    mon = StateMonitor(G, 'v', record=0)
    net = Network()
    net.add(G, mon)
    net.run(100 * ms)

    res = (mon.t/ms, mon.v[0])
    device.reinit()

    print(f'FINISHED {pid}')
    return res

And just as a hint: If you haven’t come across it yet, there is also an option to run a single Brian2 simulation on multiple threads using OpenMP. The speedup you get is not linear in the number of threads you use (using 2 threads does not necessarily half the runtime since not all the aspects of spiking neural network simulations can be parallelised). So when you want to run many simulations with different parameters (as in your example), it is more efficient to use each of your available threads for one Brian2 simulation as you have done here. But if you just want to speed up a single simulation, check out Brian’s MPI option: https://brian2.readthedocs.io/en/2.4/user/computation.html#multi-threading-with-openmp

4 Likes

Thank you @denisalevi, That was very informative. :+1:t2:
If I try to give each thread one simulation when I am changing a parameter, don’t we have memory race condition? because they are using shared memory.
To speed up a single long run YES I think the best choice is multithreding.

The way your script is set up, each simulation uses it’s own code folder to generate the code for the simulation. This is controlled by the directory keyword to your set_device call. By setting directory=None, a temporary folder with random name is created in your /tmp/ directory (at least on linux) to generate your code. This way, each simulation uses a different folder for code generation and there is nothing shared between the parallel processes.

If you don’t set the directory argument, it defaults to directory="output". In that case each process would use the same files to try to generate and compile your simulation, which would lead to compile/execution errors.

Actually, the way you originally set the directory=f"standalone{pid}" is even better than using directory=None in your case. That is, giving each parallel thread it’s own directory to work on. This way you avoid the problem of multiple threads working on the same code directories. But you also don’t need to recompile the entire project at each simulation. What happens is that in the generated code in two consecutive simulations in a single thread will only differ in slightly (in your case only the tau parameter). The compiler will therefore only recompile the file that has changed and not the entire project (and there might be some other caching going on as well, don’t know the internals). Just make sure that you don’t set clean=True, otherwise everything is recompiled from scratch each time.

2 Likes

A couple of question about the directories:

What is the difference between the directory you set in set_device() and the one you provide in device.build() (in case of multiple run() calls). Should they be the same, or different? Or maybe only one of them should be defined?

Thanks in advance!

We should indeed make this clearer. If you add additional options to set_device (e.g. directory or clean), then these will be passed on to the automatic call of device.build that gets triggered by run. If you call device.build yourself, i.e. use the build_on_run=False option, then the values provided to set_device will not be used.
To summarize, either use set_device(..., directory=...) together with a single run and no device.build call, or use device.build(..., directory=...).

1 Like