Possibility of speeding up Poisson spike generation

I have a layer of input neurons where each neuron generates Poisson spikes based on the intensity of a specific pixel of an image. The profiling with running in cpp_standalone mode seems to show this procedure takes most of the running time, i.e.

  Profiling summary
  =================
  input_neurons_spike_thresholder_codeobject_2    15.25 ms    66.87 %

Is it possible to speed this up in some way?

The python code is as follows:

    self.input_neurons = brian2.NeuronGroup(input_param.Num_input_neuron, 
                                     '''x: 1
                                        y: 1
                                        switch_stimulus: boolean
                                        correct_identification: boolean
                                        higher_than_min_spike_rate: boolean
                                        rate: Hz
                                        label: integer
                                        start_t : second  # Start time of the current trial
                                        stimulus_idx : integer  # Index of the stimulus to show
                                        stimulus_strength : 1  # Factor to scale up stimulus
                                        repetitions : integer  # Number of times the stimulus has been presented
                                        ''', threshold='rand()<rate*dt', name='input_neurons',  
                                        namespace = {'rows':input_param.input_rows, 'cols':input_param.input_cols})
    
    self.input_neurons.x = 'i%cols'  
    self.input_neurons.y = 'rows - i//rows' # // is a floored division

The corresponding c++ code is below:

void _run_input_neurons_spike_thresholder_codeobject_2()
{
    using namespace brian;

    const std::clock_t _start_time = std::clock();

    ///// CONSTANTS ///////////
    const int64_t N = 784;
    const size_t _num_spikespace = 785;
    const size_t _numdt = 1;
    const size_t _numrate = 784;
    ///// POINTERS ////////////

    int32_t* __restrict  _ptr_array_input_neurons__spikespace = _array_input_neurons__spikespace;
    double*   _ptr_array_defaultclock_dt = _array_defaultclock_dt;
    double* __restrict  _ptr_array_input_neurons_rate = _array_input_neurons_rate;



    //// MAIN CODE ////////////
    // scalar code
    const int _vectorisation_idx = -1;

    const double dt = _ptr_array_defaultclock_dt[0];



    long _count = 0;
    for(size_t _idx=0; _idx<N; _idx++)
    {
        const size_t _vectorisation_idx = _idx;

        const double rate = _ptr_array_input_neurons_rate[_idx];
        const char _cond = _rand(_vectorisation_idx) < (dt * rate);

        if(_cond) {
            _ptr_array_input_neurons__spikespace[_count++] = _idx;
        }
    }
    _ptr_array_input_neurons__spikespace[N] = _count;

    const double _run_time = (double)(std::clock() -_start_time)/CLOCKS_PER_SEC;
    input_neurons_spike_thresholder_codeobject_2_profiling_info += _run_time;
}

Hi @DavidKing2020. Iā€™ve recently pushed a big improvement for the C++ random number generation into the development version of Brian2 (see Use Mersenne Twister from C++ standard library by mstimberg Ā· Pull Request #1559 Ā· brian-team/brian2 Ā· GitHub), this should accelerate PoissonGroup quite a bit. Maybe you could try testing with this version (see Installation ā€” Brian 2 2.7.1 documentation)? I hope to do a new release before the end of the year in either case.

Thanks @mstimberg. It speeds it up by almost a factor of 2.

 input_neurons_spike_thresholder_codeobject    9.04 ms    51.79 %

It still takes most of the running time in my case. Could there be other waysto further speed it up?

I donā€™t see any easy way to speed things up more than that for now. But if your whole simulation already takes 20ms, how much more performance do you need :slightly_smiling_face: ?

Hi @mstimberg, it could be significant when running through large sample of images with many epochs, but a factor two reduction is already a huge help. Looking forward to the next release :+1:

I understand. Actually, if the rates are constant during the run, there is potentially a way to speed up the Poisson spike generation further. Iā€™ll look into this on Monday and report back here.

Hi again @DavidKing2020. So, hereā€™s what I had in mind: our standard rand() < rates*dt formulation is very flexible, since it applies even when the rates are changing over time, but it needs to generate a random number for each neuron at every timestep. If your rates are constant (within a single run), then you can use the following formulation instead, which makes use of the fact that the times between the spikes of a Poisson process follow an exponential distribution:

g = NeuronGroup(1000,
                '''
                rates : Hz (constant)
                next_spike_timestep : integer
                ''',
                threshold='t_in_timesteps >= next_spike_timestep',
                reset='''
                next_spike_timestep = t_in_timesteps + timestep(-log(rand())/rates, dt)
                ''')
g.rates = 10*Hz
g.next_spike_timestep = "timestep(-log(rand())/rates, dt)"
M = SpikeMonitor(g)

For low rates, this should be much more efficient since it only needs to generate a single random number per spike.
Hope that works for you!

Thanks @mstimberg. Just to make sure I got it right. In my case, the spike rate for each neuron in the input layer is proportional to the brightness of the corresponding pixel in an image. That means that for each neuron, the spike rate is unique but is constant in each run. Will the new method apply to this scenario?

Hi @DavidKing2020. Yes, this is the situation it was meant for: the rate can be different for each neuron, but it needs to stay constant (for the run).

In your use case, g.rates = 10*Hz would instead set a different rate for each neuron, but nothing else would need to change.

Hi @mstimberg, I tried to implement the new method, The ā€œinput_neurons_spike_thresholder_codeobjectā€ now cost much less time, but it somehow leads to different processes that consume more CPUs:

  s_input2e0_pre_plastic_codeobject             244.53 ms    59.08 %
  s_input2e0_pre_plastic_push_spikes            115.95 ms    28.01 %
  input_neurons_spike_resetter_codeobject        29.09 ms     7.03 %
  s_input2e0_post_codeobject                     15.54 ms     3.75 %
  input_neurons_spike_thresholder_codeobject      2.43 ms     0.59 %

The following is the python code. I also attached the c++ code for the top 4 codes that consume the most CPUs. These codes did not rank high when using the original ā€˜rand()<rate*dtā€™ method.

    self.input_neurons = brian2.NeuronGroup(input_param.Num_input_neuron, 
                                            '''
                                            x: 1
                                            y: 1
                                            switch_stimulus: boolean
                                            correct_identification: boolean
                                            higher_than_min_spike_rate: boolean
                                            rate: Hz (constant)
                                            next_spike_timestep : integer
                                            label: integer
                                            start_t : second  # Start time of the current trial
                                            stimulus_idx : integer  # Index of the stimulus to show
                                            stimulus_strength : 1  # Factor to scale up stimulus
                                            repetitions : integer  # Number of times the stimulus has been presented
                                            ''', 
                                            threshold='t_in_timesteps >= next_spike_timestep',
                                            reset='''
                                                  next_spike_timestep = t_in_timesteps + timestep(-log(rand())/rate, dt)
                                                  ''', name='input_neurons',  
                                            namespace = {'rows':input_param.input_rows, 'cols':input_param.input_cols})
    
    self.input_neurons.x = 'i%cols'  
    self.input_neurons.y = 'rows - i//rows' # // is a floored division
    self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"
   arguments = {self.input_neurons.rate:scaled_img*brian2.Hz} 

This argument was feeded into ā€˜run_args=argumentsā€™

input_neurons_spike_resetter_codeobject.cpp (6.7 KB)
s_input2e0_post_codeobject.cpp (12.3 KB)
s_input2e0_pre_plastic_codeobject.cpp (10.3 KB)
s_input2e0_pre_plastic_push_spikes.cpp (5.9 KB)

Hi @DavidKing2020. The slow times for the synapse-related code objects indicate that you have many more spikes in your network. I think the problem is the initialization of next_spike_timestep, since it refers to the value of rate that will only be set later with the run_args mechanism. Since the value if 0 by default, it does something wrong for the first spike ā€“ I think all your neurons spike in the first time step.
We discuss this situation in the documentation:

a more general solution is to move the point where the run_args are applied. You can do this by calling the deviceā€™s apply_run_args function

Could you try adding device.apply_run_args() before the initialization of next_spike_timestep? It should look like similar to this after the change:

    self.input_neurons.x = 'i%cols'  
    self.input_neurons.y = 'rows - i//rows' # // is a floored division
    brian2.device.apply_run_args()
    self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"

Hi @mstimberg , the source of the slow timing is indeed because all input neuron spikes. I tried to implement ā€œdevice.apply_run_args()ā€ but it didnā€™t help. All input neurons somehow still spiks. Iā€™ve attached the input_neuron code , there are two of them and essentially the same:

 input_neurons_spike_resetter_codeobject_1.cpp
 input_neurons_spike_resetter_codeobject.cpp

input_neurons_spike_resetter_codeobject.cpp (6.7 KB)
input_neurons_spike_resetter_codeobject_1.cpp (6.7 KB)

Hi @DavidKing2020, could you please share the generated main.cpp file as well? This is where the apply_run_args call would make a difference.

Hi @mstimberg , here it is attached. Thanks
main.cpp (15.1 KB)

Hmm, that looks correct, assuming that code_objects/input_neurons_group_variable_set_conditional_codeobject_2.h sets the next_spike_timestep variable as expected.

Could you maybe present a single sample for 0*ms (i.e., it will only do the initializations and nothing else), and then print out the values of input_neurons.rate and input_neurons.next_spike_timestep to see whether they are as expected?

I tried to run 0ms and 10ms. The outputs are attached. The ā€œinput_neurons_group_variable_set_conditional_codeobject_2ā€¦cppā€ are also attached. When running time is 0ms, the next_spike_time look correct but when runtime = 10ms, the neurons which should not spike has lots of 99 values. These 99 value change to larger value with longer running time. Looks like a initialization issue. Thanks
input_neurons_group_variable_set_conditional_codeobject_2.cpp (6.0 KB)
out_runTime_10ms.txt (12.5 KB)
out_runTime_0ms.txt (12.0 KB)
input_neurons_group_variable_set_conditional_codeobject_2.cpp (6.0 KB)

Oh sorry, I just realize something: in your case you have neurons that have rate 0, right? The current formulation does not work for these neuronsā€¦
Could you try replacing

    brian2.device.apply_run_args()
    self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"

by

    self.input_neurons.next_spike_timestep = 1_000_000
    brian2.device.apply_run_args()
    self.input_neurons.next_spike_timestep['rate>0*Hz'] = "timestep(-log(rand())/rate, dt)"

The 1_000_000 value for the neurons that have rate 0 is arbitrary ā€“ it should just be a value bigger than the number of timesteps in a single run, so it never gets triggered.

1 Like

I think with the previous formulation, neurons that should never spike (rate=0Hz), spiked at every timestepā€¦

The issue for ā€œspike at every timestepā€ is gone :smiley, but the overall running time is similar as the the orignal ā€˜rand()<rate*dtā€™ method. The reason is from ā€œs_input2e0_post_codeobjectā€ as shown in the following profile:

 s_input2e0_post_codeobject                    10.75 ms    45.14 %
 input_neurons_spike_thresholder_codeobject     2.64 ms    11.10 %
 s_input2e0_pre_plastic_codeobject              1.61 ms     6.77 %
 exci_neurons0_stateupdater_codeobject          1.37 ms     5.77 %
 s_input2e0_pre_plastic_push_spikes             1.28 ms     5.37 %

The ā€œs_input2e0_post_codeobject .cppā€ is attached. With the ā€˜rand()<rate*dtā€™ method, ā€œs_input2e0_post_codeobject .cppā€ didnā€™t take as much time. The code is attached. Thanks
s_input2e0_post_codeobject.cpp (12.3 KB)