Description of problem
I am trying to simulate multiple networks in parallel in standalone mode, but the speedup I get is quite poor. Is there anything that can be done about it?
Minimal code to reproduce problem
Here is a toy example, with a setup similar to a previous issue:
I use brian 2.4.1 on ubuntu 18.04.5
import joblib import brian2 as br import numpy as np import time import shutil def worker(params): core_id = params directory = "standalone" + str(core_id) br.set_device('cpp_standalone', directory=directory) tau = params*br.ms G = br.NeuronGroup(1, 'dv/dt = -v/tau : 1', method='euler') G.v = 1 mon = br.StateMonitor(G, 'v', record=0) net = br.Network() net.add(G, mon) net.run(1000 * br.second) res = (mon.t/br.ms, mon.v) br.device.reinit() return(res) n_jobs = 2 #number of networks to simulate tau_values_0 = np.arange(n_jobs) + 5 #create parameters for these networks if __name__ == "__main__": with joblib.Parallel(n_jobs=n_jobs) as parallel: #change below to n_jobs = 1 for sequential plot start = time.time() res = parallel(joblib.delayed(worker)([i, tau_values_0[i]]) for i in range(n_jobs)) print(str(round(time.time()-start,2)) + "s") # delete the directories created for i in range(n_jobs): path = "standalone" + str(i) shutil.rmtree(path)
Below is a plot of the execution time as a function of the number of networks, either simulated in parallel (code above for various n_jobs), or sequentially.
The simulation time in sequential mode is linear in the number of networks, which is ok. The simulation time in parallel mode is sublinear, which is nice, but I still find it scales poorly. The computer used to generate the plots has 4 cores / 8 threads. I’m especially surprised that it takes substantially longer to run e.g. 3 vs 2 networks, or 6 vs 5 for which case there should be a free core for joblib to use without any increase in execution time.
Is there anything that I am doing wrong?
Thank you in advance for any help.