Running multiple standalone simulations [Request for Feedback]

Hi everyone,

I know many of you have been struggling to run multiple (serial or parallel) simulations in standalone mode efficiently. I’ve recently worked on a “pragmatic” solution that should cover most of the use cases, while not being as elegant as a more general solution that has been on our list for a long time. I’d be very grateful for some feedback.

The pull request for this solution is here: Pragmatic solution to parameter changes in standalone mode by mstimberg · Pull Request #1429 · brian-team/brian2 · GitHub
You can try it out by checking out brian2 from github and switching to the standalone_parameter_change branch. Alternatively, you can directly install it via pip:

$ pip install -U https://github.com/brian-team/brian2/archive/refs/heads/standalone_parameter_change.zip

The new features are documented here: Computational methods and efficiency — Brian 2 2.5.1.post0.dev30 documentation
If you are interested, some details about what happens in the background are available here: Standalone implementation — Brian 2 2.5.1.post0.dev30 documentation

I’d be grateful for any feedback :pray: Does this cover your use cases? Does the interface/syntax make sense to you? Does the documentation explain everything in sufficient detail? And of course, if you have code which you can adapt to this feature without too much hassle – does it work? Please don’t spend too much time on adapting code for now, since the details of the feature are not yet set in stone.

Pinging a few of you who have asked related questions in the past (but of course everybody is invited to give feedback): @mdepitta, @Ziaeemehr, @DavidKing2020, @rth, @basile6

2 Likes

@mstimberg it looks very handy! Thank you for this significant improvement. I’ll try to test it in a few days and report it.

1 Like

Hi Marcel

Thanks for the followup. Is it a correct understanding that with run_args, one can now switch stimulus for each individual run efficiently?

For example,

input_neurons = NeuronGroup(100, “rate: Hz”, threshold=‘rand()<rate*dt’)
for i in range(10):
device.run(run_args={input_neurons .rate: image[i]})

,where image[i] is the ith image

If that’s the case, how does it compare to using run_regularly in terms of efficiency?

Cheers
-Wei

Hi @DavidKing2020. Yes, indeed (assuming that image[i] is a size 100 array). The difference to using run_regularly is, that run_regularly will do all this as part of a single simulation, i.e. the simulation is continuous. With run_args, each simulation starts again from scratch (even though you could e.g. set the weights at each run to the weights from the previous run to have some continuity). Efficiency-wise it shouldn’t make much of a difference, both approaches only generate and compile the code once (but initialization and writing results to disk introduces an overhead in the run_args solution – I don’t think this should be significant, though). The advantage of using run_args is that in between runs, you can do things in Python (look at the previous results, etc.). You’ll also have fewer issues with running out of memory.

wait, @mstimberg, would it be possible to propagate synaptic weights from through several runs?

pseudocode

w = inicial_weights
for ....:
   device.run(run_args={xsyn.w:w})
   w = xsyn.w

nice. is there a rough estimation when the new feature can be incorporated into the brian2cuda?

There isn’t actually much to do to make it work with brian2cuda (that was part of the motivation behind the approach). But there is a bit of work necessary to get brian2cuda up-to-date with the latest Brian release. I hope all this should be done before the end of the year.

That’s the idea, yes. But I actually realized that in the current PR that is not the case (objects with dynamic sizes such as synapses are currently excluded from the mechanism). It should be quite easy to fix it, though!

1 Like

The branch now works for synaptic variables as well :tada:
Here’s a silly proof-of-concept example that has synaptic weights (but no neurons…) that decay over time. In between each run, we halve the strongest and double the weakest weight – this is the kind of thing (comparison across synapses) that is difficult to do in standalone mode at the moment.

Example code for weight reuse over runs
from brian2 import *
set_device('cpp_standalone', build_on_run=False)

group = NeuronGroup(3, '')

syn = Synapses(group, group, 'dw/dt = -w/(10*ms) : 1')
syn.connect()
w = np.random.rand(9)
syn.w = w

mon = StateMonitor(syn, 'w', record=np.arange(9), dt=1*ms)
run(5*ms)

device.build(run=False)
weight_traces = np.empty((9, 0))
for _ in range(5):
   device.run(run_args={syn.w: w})
   weight_traces = np.hstack([weight_traces, mon.w])
   w = syn.w[:]
   # Cut the strongest weight in half and double the weakest weight
   strongest, weakest = np.argmax(w), np.argmin(w)
   w[strongest] /= 2
   w[weakest] *= 2

plt.plot(weight_traces.T)
plt.show()

Figure_1

1 Like

@mstimberg that is remarkable! Will try it as soon as we send paper out.
Congratulations and many thanks! :clap: :clap: :clap:

The paper is still in progress but … I couldn’t wait, and it’s weekend, right?!

@mstimberg I got some strange error message.

  1. I created virtual environment python -m venv x
  2. switch to new environment source x/bin/activate
  3. Installed wheel pip install wheel
  4. installed brian pip install -U https://github.com/brian-team/brian2/archive/refs/heads/standalone_parameter_change.zip
  5. create file with code you posted
  6. run it python code.py
Traceback (most recent call last):
  File "/home/rth/x/code.py", line 1, in <module>
    from brian2 import *
  File "/home/rth/x/lib/python3.9/site-packages/brian2/__init__.py", line 34, in <module>
    _check_dependencies()
  File "/home/rth/x/lib/python3.9/site-packages/brian2/__init__.py", line 16, in _check_dependencies
    import sympy
  File "/home/rth/x/lib/python3.9/site-packages/sympy/__init__.py", line 51, in <module>
    from .core import (sympify, SympifyError, cacheit, Basic, Atom,
  File "/home/rth/x/lib/python3.9/site-packages/sympy/core/__init__.py", line 4, in <module>
    from .sympify import sympify, SympifyError
  File "/home/rth/x/lib/python3.9/site-packages/sympy/core/sympify.py", line 9, in <module>
    from sympy.core.random import choice
  File "/home/rth/x/lib/python3.9/site-packages/sympy/core/random.py", line 25, in <module>
    from sympy.utilities.iterables import is_sequence
  File "/home/rth/x/lib/python3.9/site-packages/sympy/utilities/__init__.py", line 4, in <module>
    from .iterables import (flatten, group, take, subsets,
  File "/home/rth/x/lib/python3.9/site-packages/sympy/utilities/iterables.py", line 18, in <module>
    from sympy.utilities.decorator import deprecated
  File "/home/rth/x/lib/python3.9/site-packages/sympy/utilities/decorator.py", line 8, in <module>
    from sympy.testing.runtests import DependencyError, SymPyDocTests, PyTestReporter
  File "/home/rth/x/lib/python3.9/site-packages/sympy/testing/__init__.py", line 3, in <module>
    from .runtests import test, doctest
  File "/home/rth/x/lib/python3.9/site-packages/sympy/testing/runtests.py", line 20, in <module>
    import pdb
  File "/home/rth/.local/lib/python3.9/pdb.py", line 77, in <module>
    import code
  File "/home/rth/x/code.py", line 2, in <module>
    set_device('cpp_standalone', build_on_run=False)
NameError: name 'set_device' is not defined

I probably did something stupid, didn’t I?
-rth

@rth This kind of error is super confusing. The name of your file clashes with an existing top-level module (code) - simply renaming it to something else should fix it!

1 Like

Hi @mstimberg , suppose the rate is a TimedArray, e.g.

  img_rates = TimedArray(img_array*Hz, dt=self.simulation_duration)
  input_neurons = NeuronGroup(10,  'rate=img_rates(t, i)',  threshold='rand()<rate*dt')

The img_array is the input image. What would be the right way to use run_args to get inputs in this case?

Good question, I had not thought of this! The run_args method cannot currently change the value in a TimedArray – it only deals with variables of groups (NeuronGroup, Synapses, …). I’ll look into supporting it, there are a few technical subtleties but in principle it should be doable via the same mechanism.

Until then, you have two options:

  1. If you can store all the input images in memory at the same time, then you can use the basic approach that I described here: Changing stimuli programmatically in C++ standalone, i.e. instead of directly using the rate at time t from the array, you’d shift the time according to the stimulus you want (no need for the start_t part, since each new simulation will always start at 0). The run_args mechanism will still make your life much simpler, since you don’t have to use Brian’s built-in mechanisms ( Synapses, run_regularly, etc.) to program the whole logic (spike count, deciding whether to switch the stimulus or to increase its strength, etc.). Instead, you can do this after each run in Python, and then set the stimulus (stimulus_idx and stimulus_strength in the post I linked above) via run_args.
  2. If your input data is too big to be in memory all at once, you can also re-create a TimedArray using a NeuronGroup and a run_regularly operation, by making use of internal machinery that allows you to make a variable (e.g. your rate) a “view” on a subset of values in another group (storing a “flat” version of your input image). Here’s a little example that shows how this works:
    A simple 2D TimedArray like this:
values = np.array([[1, 2], [4, 5], [3, 4]])
ta = TimedArray(values, dt=1*ms)

G = NeuronGroup(2, 'x = ta(t, i) : 1')

can be replaced by this (exact same results!):

value_group = NeuronGroup(6, 'values : 1')
value_group.values = values.flatten()

G = NeuronGroup(2, 'index : integer')
# The variable x does not appear in the equations, but is a variable of
# the model, added manually via add_reference:
G.variables.add_reference('x' , value_group, 'values', index='index')

# a bit silly: indices have to start at negative values, since the run_regularly
# operation will already run once at t=0s
G.index = 'i - N'
G.run_regularly('index += N', dt=1*ms)  # Advance to next "row" of array

Now while this of course looks quite a bit more complex, it has the advantage that the values (i.e. your input image) are now stored in a simple NeuronGroup variable: value_group.values. You can therefore swap out the underlying values with

device.run(run_args={value_group.values: new_values.flatten()})

Hope that gets you going!

Thanks. I’m currently using the option#1 together with MPI as discussed in chat group, namely each node make their own cpp compilation independently in a different directory. This seems working fine for now and I will keep it that way until the new features arrvie.

I appreciate your work, but given the full functionality support is out of scope for this mode discussed, I switched back to Python.

I finally got time to play with Brian’s new feature.
The big problem is that I need to run simulations for a few days of model time! Of course, even if I record a dozen of neurons from a thousand, just for debugging purposes, I hit memory overflow in a few hours of model time :dizzy_face:

So, the idea is to stop the simulation from time to time, save data into a file and continue the run.

Non-stop simulation

Hear, a simple balanced network, just to illustrate an issue with this approach on Brian:

import os
from brian2 import *
set_device('cpp_standalone')
prefs.devices.cpp_standalone.openmp_threads = os.cpu_count()-1

El,gl,taum = -60, 0.005, 20*ms
Ee   ,taue =   0       ,  5*ms
Ei   ,taui = -80       , 50*ms
equ='''
dv/dt  = (gl*(El-v)+ge*(Ee-v)+gi*(Ei-v))/taum : 1 (unless refractory)
dge/dt = -ge/taue : 1
dgi/dt = -gi/taui : 1
'''

pop = NeuronGroup(1000,equ,method='rk4',threshold='v>-10', reset='v=El', refractory=5*ms)
pop.v = El
pie = PoissonInput(pop[:500], 'ge', 100, 100*Hz, weight=5e-4)
esyn = Synapses(pop,pop,'w : 1', on_pre='ge_post += w')
esyn.connect(p=0.2)
esyn.w = 0.1
esyn.delay = 200*ms
isyn = Synapses(pop,pop,'w : 1', on_pre='gi_post += w')
isyn.connect(p=0.2)
isyn.w = 0.002
esyn.delay = 200*ms

spm = SpikeMonitor(pop)
stm = StateMonitor(pop,['v','ge','gi'],record=[499,501])

run(200*second)
ax = subplot(411)
plot(spm.t/ms, spm.i, 'k.')
subplot(412,sharex=ax)
plot(stm.t/ms,stm.v[0],'k-')
plot(stm.t/ms,stm.v[1],'r-')
subplot(413,sharex=ax)
plot(stm.t/ms,stm.ge[0],'k-')
plot(stm.t/ms,stm.ge[1],'r-')
subplot(414,sharex=ax)
plot(stm.t/ms,stm.gi[0],'k-')
plot(stm.t/ms,stm.gi[1],'r-')
show()

Simulation with multiple stops

If I use the new method, something is missing

import os
from brian2 import *
set_device('cpp_standalone', build_on_run=False)
prefs.devices.cpp_standalone.openmp_threads = os.cpu_count()-1

El,gl,taum = -60, 0.005, 20*ms
Ee   ,taue =   0       ,  5*ms
Ei   ,taui = -80       , 50*ms
equ='''
dv/dt  = (gl*(El-v)+ge*(Ee-v)+gi*(Ei-v))/taum : 1 (unless refractory)
dge/dt = -ge/taue : 1
dgi/dt = -gi/taui : 1
'''

pop = NeuronGroup(1000,equ,method='rk4',threshold='v>-10', reset='v=El', refractory=5*ms)
pie = PoissonInput(pop[:500], 'ge', 100, 100*Hz, weight=5e-4)
esyn = Synapses(pop,pop,'w : 1', on_pre='ge_post += w')
esyn.connect(p=0.2)
esyn.w = 0.1
esyn.delay = 200*ms
isyn = Synapses(pop,pop,'w : 1', on_pre='gi_post += w')
isyn.connect(p=0.2)
isyn.w = 0.002
esyn.delay = 200*ms

spm = SpikeMonitor(pop)
stm = StateMonitor(pop,['v','ge','gi'],record=[499,501])

run(5*second)
device.build(run=False)
ax1 = subplot(411)
ax2 = subplot(412,sharex=ax1)
ax3 = subplot(413,sharex=ax1)
ax4 = subplot(414,sharex=ax1)
vm  = ones(1000)*El
ge  = zeros(1000)
gi  = zeros(1000)
for _x in range(40):
    device.run(run_args={pop.v: vm, pop.ge: ge, pop.gi: gi})
    ax1.plot(spm.t/ms+5000*_x, spm.i, 'k.')
    ax2.plot(stm.t/ms+5000*_x,stm.v[0],'k-')
    ax2.plot(stm.t/ms+5000*_x,stm.v[1],'r-')
    ax3.plot(stm.t/ms+5000*_x,stm.ge[0],'k-')
    ax3.plot(stm.t/ms+5000*_x,stm.ge[1],'r-')
    ax4.plot(stm.t/ms+5000*_x,stm.gi[0],'k-')
    ax4.plot(stm.t/ms+5000*_x,stm.gi[1],'r-')
    vm = pop.v[:]
    ge = pop.ge[:]
    gi = pop.gi[:]
show()

and results aren’t pretty

An Issue

Well, it is obviously the spike queue! Is it possible to save the spike queue somewhere and then restore it from the file? That will solve any problem of a long run with multiple stops.

Note This method works perfectly if there are no delays in the system. But as soon as spikes have to wait in spike queue, this method have an issue.

@ikitayama I am a bit confused by your post (given that it is your first :blush: ) – what “full functionality” are you missing here?

They’re store() and restore().

Unfortunately, this is not possible at the moment in any clean kind of way. This is one of the things that the store/restore method deals with (Running a simulation — Brian 2 2.5.1 documentation) which is not available for C++ standalone.

But if your issue is only the memory usage of the monitors, then I think an easier solution would be to use C++ code to write your recordings directly to disk instead of keeping them in memory. For a SpikeMonitor, you can find a solution here: Real Time Monitor - #2 by mstimberg
For a StateMonitor, you can use a function like this:

def disk_writer(filename, size, name):
    code = '''
    double %name%(int index, double value) {
        static std::ofstream outfile("%filename%", ios::binary | ios::out);
        static double data_container[%size%];  // stores data for one timestep
        data_container[index] = value;
        if (index == %size% - 1) { //write to disk when last value has been written
            outfile.write(reinterpret_cast<char*>(data_container), %size%*sizeof(double));
        }
        return 0.0;
    }
    '''.replace('%name%', name).replace('%filename%', filename).replace('%size%', str(size))

    @implementation('cpp', code)
    @check_units(index=1, value=1, result=1)
    def disk_writer(index, value):
        raise NotImplementedError('C++ only')
    return disk_writer

This can then be used in a run_regularly operation to replace a StateMonitor:

store_v = disk_writer(file_name, len(group), 'store_v')

group.run_regularly('dummy = store_v(i, v)')

After the run, you can reload the data like this:

v_values = np.fromfile(file_name)
v_values = v_values.reshape((-1, len(group))).T

Here’s a full example demonstrating that it records the same thing as the StateMonitor:

Full code example
import os
from brian2 import *
set_device('cpp_standalone')
group = NeuronGroup(2, 'dv/dt = -v/(10*ms) + 0.1/sqrt(5*ms)*xi : 1', method='euler')
mon = StateMonitor(group, 'v', record=True)

def disk_writer(filename, size, name):
    code = '''
    double %name%(int index, double value) {
        static std::ofstream outfile("%filename%", ios::binary | ios::out);
        static double data_container[%size%];  // stores data for one timestep
        data_container[index] = value;
        if (index == %size% - 1) { //write to disk when last value has been written
            outfile.write(reinterpret_cast<char*>(data_container), %size%*sizeof(double));
        }
        return 0.0;
    }
    '''.replace('%name%', name).replace('%filename%', filename).replace('%size%', str(size))

    @implementation('cpp', code)
    @check_units(index=1, value=1, result=1)
    def disk_writer(index, value):
        raise NotImplementedError('C++ only')
    return disk_writer

file_name = '/tmp/v_values.npy'
store_v = disk_writer(file_name, len(group), 'store_v')

group.run_regularly('dummy = store_v(i, v)')

run(100*ms, report='text')

v_values = np.fromfile(file_name)
v_values = v_values.reshape((-1, len(group))).T
fig, axs = plt.subplots(2, 1, sharex=True, sharey=True)
axs[0].plot(mon.t/ms, mon.v.T)
axs[1].plot(mon.t/ms, v_values.T)
plt.show()

Figure_1

1 Like