I have a layer of input neurons where each neuron generates Poisson spikes based on the intensity of a specific pixel of an image. The profiling with running in cpp_standalone mode seems to show this procedure takes most of the running time, i.e.
Profiling summary
=================
input_neurons_spike_thresholder_codeobject_2 15.25 ms 66.87 %
Is it possible to speed this up in some way?
The python code is as follows:
self.input_neurons = brian2.NeuronGroup(input_param.Num_input_neuron,
'''x: 1
y: 1
switch_stimulus: boolean
correct_identification: boolean
higher_than_min_spike_rate: boolean
rate: Hz
label: integer
start_t : second # Start time of the current trial
stimulus_idx : integer # Index of the stimulus to show
stimulus_strength : 1 # Factor to scale up stimulus
repetitions : integer # Number of times the stimulus has been presented
''', threshold='rand()<rate*dt', name='input_neurons',
namespace = {'rows':input_param.input_rows, 'cols':input_param.input_cols})
self.input_neurons.x = 'i%cols'
self.input_neurons.y = 'rows - i//rows' # // is a floored division
The corresponding c++ code is below:
void _run_input_neurons_spike_thresholder_codeobject_2()
{
using namespace brian;
const std::clock_t _start_time = std::clock();
///// CONSTANTS ///////////
const int64_t N = 784;
const size_t _num_spikespace = 785;
const size_t _numdt = 1;
const size_t _numrate = 784;
///// POINTERS ////////////
int32_t* __restrict _ptr_array_input_neurons__spikespace = _array_input_neurons__spikespace;
double* _ptr_array_defaultclock_dt = _array_defaultclock_dt;
double* __restrict _ptr_array_input_neurons_rate = _array_input_neurons_rate;
//// MAIN CODE ////////////
// scalar code
const int _vectorisation_idx = -1;
const double dt = _ptr_array_defaultclock_dt[0];
long _count = 0;
for(size_t _idx=0; _idx<N; _idx++)
{
const size_t _vectorisation_idx = _idx;
const double rate = _ptr_array_input_neurons_rate[_idx];
const char _cond = _rand(_vectorisation_idx) < (dt * rate);
if(_cond) {
_ptr_array_input_neurons__spikespace[_count++] = _idx;
}
}
_ptr_array_input_neurons__spikespace[N] = _count;
const double _run_time = (double)(std::clock() -_start_time)/CLOCKS_PER_SEC;
input_neurons_spike_thresholder_codeobject_2_profiling_info += _run_time;
}
Hi @DavidKing2020. Iāve recently pushed a big improvement for the C++ random number generation into the development version of Brian2 (see Use Mersenne Twister from C++ standard library by mstimberg Ā· Pull Request #1559 Ā· brian-team/brian2 Ā· GitHub), this should accelerate PoissonGroup
quite a bit. Maybe you could try testing with this version (see Installation ā Brian 2 2.7.1 documentation)? I hope to do a new release before the end of the year in either case.
Thanks @mstimberg. It speeds it up by almost a factor of 2.
input_neurons_spike_thresholder_codeobject 9.04 ms 51.79 %
It still takes most of the running time in my case. Could there be other waysto further speed it up?
I donāt see any easy way to speed things up more than that for now. But if your whole simulation already takes 20ms, how much more performance do you need ?
Hi @mstimberg, it could be significant when running through large sample of images with many epochs, but a factor two reduction is already a huge help. Looking forward to the next release
I understand. Actually, if the rates are constant during the run, there is potentially a way to speed up the Poisson spike generation further. Iāll look into this on Monday and report back here.
Hi again @DavidKing2020. So, hereās what I had in mind: our standard rand() < rates*dt
formulation is very flexible, since it applies even when the rates are changing over time, but it needs to generate a random number for each neuron at every timestep. If your rates are constant (within a single run), then you can use the following formulation instead, which makes use of the fact that the times between the spikes of a Poisson process follow an exponential distribution:
g = NeuronGroup(1000,
'''
rates : Hz (constant)
next_spike_timestep : integer
''',
threshold='t_in_timesteps >= next_spike_timestep',
reset='''
next_spike_timestep = t_in_timesteps + timestep(-log(rand())/rates, dt)
''')
g.rates = 10*Hz
g.next_spike_timestep = "timestep(-log(rand())/rates, dt)"
M = SpikeMonitor(g)
For low rates, this should be much more efficient since it only needs to generate a single random number per spike.
Hope that works for you!
Thanks @mstimberg. Just to make sure I got it right. In my case, the spike rate for each neuron in the input layer is proportional to the brightness of the corresponding pixel in an image. That means that for each neuron, the spike rate is unique but is constant in each run. Will the new method apply to this scenario?
Hi @DavidKing2020. Yes, this is the situation it was meant for: the rate can be different for each neuron, but it needs to stay constant (for the run).
In your use case, g.rates = 10*Hz
would instead set a different rate for each neuron, but nothing else would need to change.
Hi @mstimberg, I tried to implement the new method, The āinput_neurons_spike_thresholder_codeobjectā now cost much less time, but it somehow leads to different processes that consume more CPUs:
s_input2e0_pre_plastic_codeobject 244.53 ms 59.08 %
s_input2e0_pre_plastic_push_spikes 115.95 ms 28.01 %
input_neurons_spike_resetter_codeobject 29.09 ms 7.03 %
s_input2e0_post_codeobject 15.54 ms 3.75 %
input_neurons_spike_thresholder_codeobject 2.43 ms 0.59 %
The following is the python code. I also attached the c++ code for the top 4 codes that consume the most CPUs. These codes did not rank high when using the original ārand()<rate*dtā method.
self.input_neurons = brian2.NeuronGroup(input_param.Num_input_neuron,
'''
x: 1
y: 1
switch_stimulus: boolean
correct_identification: boolean
higher_than_min_spike_rate: boolean
rate: Hz (constant)
next_spike_timestep : integer
label: integer
start_t : second # Start time of the current trial
stimulus_idx : integer # Index of the stimulus to show
stimulus_strength : 1 # Factor to scale up stimulus
repetitions : integer # Number of times the stimulus has been presented
''',
threshold='t_in_timesteps >= next_spike_timestep',
reset='''
next_spike_timestep = t_in_timesteps + timestep(-log(rand())/rate, dt)
''', name='input_neurons',
namespace = {'rows':input_param.input_rows, 'cols':input_param.input_cols})
self.input_neurons.x = 'i%cols'
self.input_neurons.y = 'rows - i//rows' # // is a floored division
self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"
arguments = {self.input_neurons.rate:scaled_img*brian2.Hz}
This argument was feeded into ārun_args=argumentsā
input_neurons_spike_resetter_codeobject.cpp (6.7 KB)
s_input2e0_post_codeobject.cpp (12.3 KB)
s_input2e0_pre_plastic_codeobject.cpp (10.3 KB)
s_input2e0_pre_plastic_push_spikes.cpp (5.9 KB)
Hi @DavidKing2020. The slow times for the synapse-related code objects indicate that you have many more spikes in your network. I think the problem is the initialization of next_spike_timestep
, since it refers to the value of rate
that will only be set later with the run_args
mechanism. Since the value if 0 by default, it does something wrong for the first spike ā I think all your neurons spike in the first time step.
We discuss this situation in the documentation:
a more general solution is to move the point where the run_args
are applied. You can do this by calling the deviceās apply_run_args
function
Could you try adding device.apply_run_args()
before the initialization of next_spike_timestep
? It should look like similar to this after the change:
self.input_neurons.x = 'i%cols'
self.input_neurons.y = 'rows - i//rows' # // is a floored division
brian2.device.apply_run_args()
self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"
Hi @mstimberg , the source of the slow timing is indeed because all input neuron spikes. I tried to implement ādevice.apply_run_args()ā but it didnāt help. All input neurons somehow still spiks. Iāve attached the input_neuron code , there are two of them and essentially the same:
input_neurons_spike_resetter_codeobject_1.cpp
input_neurons_spike_resetter_codeobject.cpp
input_neurons_spike_resetter_codeobject.cpp (6.7 KB)
input_neurons_spike_resetter_codeobject_1.cpp (6.7 KB)
Hi @DavidKing2020, could you please share the generated main.cpp
file as well? This is where the apply_run_args
call would make a difference.
Hi @mstimberg , here it is attached. Thanks
main.cpp (15.1 KB)
Hmm, that looks correct, assuming that code_objects/input_neurons_group_variable_set_conditional_codeobject_2.h
sets the next_spike_timestep
variable as expected.
Could you maybe present a single sample for 0*ms
(i.e., it will only do the initializations and nothing else), and then print out the values of input_neurons.rate
and input_neurons.next_spike_timestep
to see whether they are as expected?
I tried to run 0ms and 10ms. The outputs are attached. The āinput_neurons_group_variable_set_conditional_codeobject_2ā¦cppā are also attached. When running time is 0ms, the next_spike_time look correct but when runtime = 10ms, the neurons which should not spike has lots of 99 values. These 99 value change to larger value with longer running time. Looks like a initialization issue. Thanks
input_neurons_group_variable_set_conditional_codeobject_2.cpp (6.0 KB)
out_runTime_10ms.txt (12.5 KB)
out_runTime_0ms.txt (12.0 KB)
input_neurons_group_variable_set_conditional_codeobject_2.cpp (6.0 KB)
Oh sorry, I just realize something: in your case you have neurons that have rate 0, right? The current formulation does not work for these neuronsā¦
Could you try replacing
brian2.device.apply_run_args()
self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"
by
self.input_neurons.next_spike_timestep = 1_000_000
brian2.device.apply_run_args()
self.input_neurons.next_spike_timestep['rate>0*Hz'] = "timestep(-log(rand())/rate, dt)"
The 1_000_000
value for the neurons that have rate 0 is arbitrary ā it should just be a value bigger than the number of timesteps in a single run, so it never gets triggered.
1 Like
I think with the previous formulation, neurons that should never spike (rate=0Hz), spiked at every timestepā¦
The issue for āspike at every timestepā is gone :smiley, but the overall running time is similar as the the orignal ārand()<rate*dtā method. The reason is from ās_input2e0_post_codeobjectā as shown in the following profile:
s_input2e0_post_codeobject 10.75 ms 45.14 %
input_neurons_spike_thresholder_codeobject 2.64 ms 11.10 %
s_input2e0_pre_plastic_codeobject 1.61 ms 6.77 %
exci_neurons0_stateupdater_codeobject 1.37 ms 5.77 %
s_input2e0_pre_plastic_push_spikes 1.28 ms 5.37 %
The ās_input2e0_post_codeobject .cppā is attached. With the ārand()<rate*dtā method, ās_input2e0_post_codeobject .cppā didnāt take as much time. The code is attached. Thanks
s_input2e0_post_codeobject.cpp (12.3 KB)