Multiple run in standalone mode

For multiple run using standalone mode I used tutorial part 3 and multiple run and an example from issue #1189:

In each run the time constant tau is changed and simulation run for 1 second.
‍‍‍‍‍‍

def simulate(tau):
    b2.start_scope()

    if standalone_mode:
        b2.get_device().reinit()
        b2.get_device().activate(build_on_run=False,
                                 directory=directory_name)

    net = b2.Network()
    P = b2.PoissonGroup(num_inputs, rates=input_rate)
    G = b2.NeuronGroup(1, eqs, threshold='v>1', reset='v=0', method='euler')
    S = b2.Synapses(P, G, on_pre='v += weight')
    S.connect()
    M = b2.SpikeMonitor(G)
    net.add(P)
    net.add(G)
    net.add(S)
    net.add(M)

    net.run(1000 * b2.ms)
    
    if standalone_mode:
        b2.get_device().build(directory=directory_name,
                              compile=True,
                              run=True,
                              debug=False)
    return M

The question is that Why I am not getting sensible speedup compared to not using standalone mode?
Am I doing something wrong?

To run the simulation :

    start_time = time()
    num_inputs = 100
    input_rate = 10 * b2.Hz
    weight = 0.1
    npoints = 15
    tau_range = np.linspace(1, 15, npoints) * b2.ms
    eqs = '''
    dv/dt = -v/tau : 1
    '''

    output_rates = np.zeros(npoints)
    for ii in range(npoints):

        tau_i = tau_range[ii]
        M = simulate(tau_i)
        output_rates[ii] = M.num_spikes / b2.second

    print("Done in {}".format(time()-start_time))

The total simulation time for a simulation that uses code generation consists of the time for code generation + compilation, and the time of the actual simulation run. The standalone mode accelerates the simulation run, but the code generation + compilation time is the same or even bigger than for runtime mode, because runtime mode does some more advanced caching (and you can avoid the compilation overhead completely by using the store/restore mechanism).
In your example, the simulation is very simple and the simulation run takes very little time, much less than the time to generate/compile the code. Here, the fastest solution (ignoring the store/restore approach) would actually be to switch off code compilation completely by using b2.prefs.codegen.target = 'numpy' – the gain from not having to compile any code will outweigh the slower simulation time. This would change if the simulation itself were more complicated, or simply longer.

There is a way to do this kind of simulation where only a parameter changes between runs, without rebuilding the network and recompiling the code in standalone. Unfortunately, it is not really exposed by Brian’s API at the moment and needs some knowledge about internals. If you want to dive deeper into this, you can have a look at an example in a gist where I demonstrate that technique.

In general, if you want maximal performance you should avoid using multiple runs. One solution is to make a single long run, where you change a parameter (and e.g. reset the membrane potential) periodically with mechanisms such as run_regularly and/or TimedArray. In this case, it would be even more efficient to “vectorize” your example: instead of using a single neuron in the NeuronGroup, you’d have 15 ones, each with a different tau parameter. You can then run a single simulation and get the exact same results.

2 Likes

In this case, it would be even more efficient to “vectorize” your example: instead of using a single neuron in the NeuronGroup , you’d have 15 ones, each with a different tau parameter. You can then run a single simulation and get the exact same results.

I just want to add to Marcel’s answer, this is a pretty standard approach to a parameter search and it works great! In your case the code should look like this

num_inputs = 100
input_rate = 10 * b2.Hz
weight = 0.1
npoints = 15
tau_range = np.linspace(1, 15, npoints) * b2.ms
eqs = '''
    dv/dt = -v/tau : 1
    tau : second
'''
P = b2.PoissonGroup(num_inputs, rates=input_rate)
G = b2.NeuronGroup(npoints, eqs, threshold='v>1', reset='v=0', method='euler')
G.tau = tau_range
S = b2.Synapses(P, G, on_pre='v += weight')
S.connect()
M = b2.SpikeMonitor(G)
b2.run(1000 * b2.ms)
for ii in range(npoints):
   print( ii, float(len(np.where(M.i == ii)[0]) )/b2.second )

That is all.

2 Likes

Thank you @mstimberg and @rth. The answer is quite clear.
I am considering this as a toy example to use the same method for real problems, and in real problem probably the vectorization does not work.

What I got from the example at the provided link is that it change a parameter in the C++ code without need to compile again. Some thing like giving a parameter to the code from terminal without compiling for every run.

$ ./executable tau

int main(char **argv)
double tau = atof(argv[1]);
...
  • Does it also work for more that one parameter?

  • I know a bit C++ but not familiar with internal code of the Brian.
    Is it straightforward to track some parameters in the C++ code and feed their values to the code from the terminal and just use an script to run the code many times?
    I suppose that the network and everything remain the same as before and only a few parameters change in each run.

I don’t see any problems here. I understand your confusion, I code in C/C++ for many years, and it was a bit hard to adjust the way of thinking.

You are completely right. In C/C++ we usually assume one process - one parameter set. We use bash (or any other sh) to run many instances of the same program and supply parameters as command-line arguments. C-code is compiled once and then you use it many times, but python is an interpreter, pretty clever, but still an interpreter. So it builds the code on the way and this adds time to overall performance. As Marcel said, if your calculations are heavy enough to keep CPU busy longer than the compilation, you can ignore compilation time. In this case, you can build an application which runs only one one parameter set:

import sys
if len(sys.argv) < 2 :
  print("USAGE: program tau")
  exit(1)
try:
  tau = float(sys.argv[1])
except:
  raise ValueError(" Cannot convert {} into float".format(sys.argv[1]) )

and launch this code in the same way as your c-code python single-run.py tau

If a single parameter evaluation is lite (like in your example), the compilation will take longer than computations themselves. So if you want to minimize overall time - you need to pack more parameter evaluations in one run. This is quite easy to implement

import sys
if len(sys.argv) < 4 :
  print("USAGE: program tau-min tua-max number")
  exit(1)
taumin,taumax,npoints = sys.argv[1:4]
try:
  taumin = float(taumin)
except:
  raise ValueError(" Cannot convert {} into float".format(taumin) )
try:
  taumax = float(taumax)
except:
  raise ValueError(" Cannot convert {} into float".format(taumax) )
try:
  npoints = int(npoints)
except:
  raise ValueError(" Cannot convert {} into int".format(npoints) )
tau_range = np.linspace(taumin, taumax, npoints) * b2.ms

Now you can run this program as python example.py 1. 15. 15

1 Like

Adding to @rth’s comments. If you have many short simulations that you cannot vectorize then it would indeed be useful to run the same standalone simulation repeatedly and only change one or several parameters. With the code I linked to you can do this from a single script (as shown in the code), but the same approach would in principle also work if you run every new parameter in its own script. The details are a bit more complicated unfortunately and I am not sure it is worth it.

We plan to make these kind of things easier in the future (see e.g. this github project), but progress is quite slow unfortunately.

1 Like

Awesome!, Thank you, That’s what I wanted. :+1:t2: