Possibility of speeding up Poisson spike generation

DavidKing2020 · 5 December 2024 16:52

I have a layer of input neurons where each neuron generates Poisson spikes based on the intensity of a specific pixel of an image. The profiling with running in cpp_standalone mode seems to show this procedure takes most of the running time, i.e.

  Profiling summary
  =================
  input_neurons_spike_thresholder_codeobject_2    15.25 ms    66.87 %

Is it possible to speed this up in some way?

The python code is as follows:

    self.input_neurons = brian2.NeuronGroup(input_param.Num_input_neuron, 
                                     '''x: 1
                                        y: 1
                                        switch_stimulus: boolean
                                        correct_identification: boolean
                                        higher_than_min_spike_rate: boolean
                                        rate: Hz
                                        label: integer
                                        start_t : second  # Start time of the current trial
                                        stimulus_idx : integer  # Index of the stimulus to show
                                        stimulus_strength : 1  # Factor to scale up stimulus
                                        repetitions : integer  # Number of times the stimulus has been presented
                                        ''', threshold='rand()<rate*dt', name='input_neurons',  
                                        namespace = {'rows':input_param.input_rows, 'cols':input_param.input_cols})
    
    self.input_neurons.x = 'i%cols'  
    self.input_neurons.y = 'rows - i//rows' # // is a floored division

The corresponding c++ code is below:

void _run_input_neurons_spike_thresholder_codeobject_2()
{
    using namespace brian;

    const std::clock_t _start_time = std::clock();

    ///// CONSTANTS ///////////
    const int64_t N = 784;
    const size_t _num_spikespace = 785;
    const size_t _numdt = 1;
    const size_t _numrate = 784;
    ///// POINTERS ////////////

    int32_t* __restrict  _ptr_array_input_neurons__spikespace = _array_input_neurons__spikespace;
    double*   _ptr_array_defaultclock_dt = _array_defaultclock_dt;
    double* __restrict  _ptr_array_input_neurons_rate = _array_input_neurons_rate;



    //// MAIN CODE ////////////
    // scalar code
    const int _vectorisation_idx = -1;

    const double dt = _ptr_array_defaultclock_dt[0];



    long _count = 0;
    for(size_t _idx=0; _idx<N; _idx++)
    {
        const size_t _vectorisation_idx = _idx;

        const double rate = _ptr_array_input_neurons_rate[_idx];
        const char _cond = _rand(_vectorisation_idx) < (dt * rate);

        if(_cond) {
            _ptr_array_input_neurons__spikespace[_count++] = _idx;
        }
    }
    _ptr_array_input_neurons__spikespace[N] = _count;

    const double _run_time = (double)(std::clock() -_start_time)/CLOCKS_PER_SEC;
    input_neurons_spike_thresholder_codeobject_2_profiling_info += _run_time;
}

mstimberg · 5 December 2024 17:11

Hi @DavidKing2020. I’ve recently pushed a big improvement for the C++ random number generation into the development version of Brian2 (see Use Mersenne Twister from C++ standard library by mstimberg · Pull Request #1559 · brian-team/brian2 · GitHub), this should accelerate PoissonGroup quite a bit. Maybe you could try testing with this version (see Installation — Brian 2 2.7.1 documentation)? I hope to do a new release before the end of the year in either case.

DavidKing2020 · 5 December 2024 20:21

Thanks @mstimberg. It speeds it up by almost a factor of 2.

 input_neurons_spike_thresholder_codeobject    9.04 ms    51.79 %

It still takes most of the running time in my case. Could there be other waysto further speed it up?

mstimberg · 6 December 2024 21:01

I don’t see any easy way to speed things up more than that for now. But if your whole simulation already takes 20ms, how much more performance do you need ?

DavidKing2020 · 6 December 2024 21:45

Hi @mstimberg, it could be significant when running through large sample of images with many epochs, but a factor two reduction is already a huge help. Looking forward to the next release

mstimberg · 7 December 2024 11:34

I understand. Actually, if the rates are constant during the run, there is potentially a way to speed up the Poisson spike generation further. I’ll look into this on Monday and report back here.

mstimberg · 9 December 2024 14:04

Hi again @DavidKing2020. So, here’s what I had in mind: our standard rand() < rates*dt formulation is very flexible, since it applies even when the rates are changing over time, but it needs to generate a random number for each neuron at every timestep. If your rates are constant (within a single run), then you can use the following formulation instead, which makes use of the fact that the times between the spikes of a Poisson process follow an exponential distribution:

g = NeuronGroup(1000,
                '''
                rates : Hz (constant)
                next_spike_timestep : integer
                ''',
                threshold='t_in_timesteps >= next_spike_timestep',
                reset='''
                next_spike_timestep = t_in_timesteps + timestep(-log(rand())/rates, dt)
                ''')
g.rates = 10*Hz
g.next_spike_timestep = "timestep(-log(rand())/rates, dt)"
M = SpikeMonitor(g)

For low rates, this should be much more efficient since it only needs to generate a single random number per spike.
Hope that works for you!

DavidKing2020 · 9 December 2024 14:31

Thanks @mstimberg. Just to make sure I got it right. In my case, the spike rate for each neuron in the input layer is proportional to the brightness of the corresponding pixel in an image. That means that for each neuron, the spike rate is unique but is constant in each run. Will the new method apply to this scenario?

mstimberg · 9 December 2024 14:39

Hi @DavidKing2020. Yes, this is the situation it was meant for: the rate can be different for each neuron, but it needs to stay constant (for the run).

mstimberg · 9 December 2024 14:40

In your use case, g.rates = 10*Hz would instead set a different rate for each neuron, but nothing else would need to change.

DavidKing2020 · 9 December 2024 21:01

Hi @mstimberg, I tried to implement the new method, The “input_neurons_spike_thresholder_codeobject” now cost much less time, but it somehow leads to different processes that consume more CPUs:

  s_input2e0_pre_plastic_codeobject             244.53 ms    59.08 %
  s_input2e0_pre_plastic_push_spikes            115.95 ms    28.01 %
  input_neurons_spike_resetter_codeobject        29.09 ms     7.03 %
  s_input2e0_post_codeobject                     15.54 ms     3.75 %
  input_neurons_spike_thresholder_codeobject      2.43 ms     0.59 %

The following is the python code. I also attached the c++ code for the top 4 codes that consume the most CPUs. These codes did not rank high when using the original ‘rand()<rate*dt’ method.

    self.input_neurons = brian2.NeuronGroup(input_param.Num_input_neuron, 
                                            '''
                                            x: 1
                                            y: 1
                                            switch_stimulus: boolean
                                            correct_identification: boolean
                                            higher_than_min_spike_rate: boolean
                                            rate: Hz (constant)
                                            next_spike_timestep : integer
                                            label: integer
                                            start_t : second  # Start time of the current trial
                                            stimulus_idx : integer  # Index of the stimulus to show
                                            stimulus_strength : 1  # Factor to scale up stimulus
                                            repetitions : integer  # Number of times the stimulus has been presented
                                            ''', 
                                            threshold='t_in_timesteps >= next_spike_timestep',
                                            reset='''
                                                  next_spike_timestep = t_in_timesteps + timestep(-log(rand())/rate, dt)
                                                  ''', name='input_neurons',  
                                            namespace = {'rows':input_param.input_rows, 'cols':input_param.input_cols})
    
    self.input_neurons.x = 'i%cols'  
    self.input_neurons.y = 'rows - i//rows' # // is a floored division
    self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"
   arguments = {self.input_neurons.rate:scaled_img*brian2.Hz}

This argument was feeded into ‘run_args=arguments’

input_neurons_spike_resetter_codeobject.cpp (6.7 KB)
s_input2e0_post_codeobject.cpp (12.3 KB)
s_input2e0_pre_plastic_codeobject.cpp (10.3 KB)
s_input2e0_pre_plastic_push_spikes.cpp (5.9 KB)

mstimberg · 10 December 2024 12:27

Hi @DavidKing2020. The slow times for the synapse-related code objects indicate that you have many more spikes in your network. I think the problem is the initialization of next_spike_timestep, since it refers to the value of rate that will only be set later with the run_args mechanism. Since the value if 0 by default, it does something wrong for the first spike – I think all your neurons spike in the first time step.
We discuss this situation in the documentation:

a more general solution is to move the point where the run_args are applied. You can do this by calling the device’s apply_run_args function

Could you try adding device.apply_run_args() before the initialization of next_spike_timestep? It should look like similar to this after the change:

    self.input_neurons.x = 'i%cols'  
    self.input_neurons.y = 'rows - i//rows' # // is a floored division
    brian2.device.apply_run_args()
    self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"

DavidKing2020 · 10 December 2024 14:16

Hi @mstimberg , the source of the slow timing is indeed because all input neuron spikes. I tried to implement “device.apply_run_args()” but it didn’t help. All input neurons somehow still spiks. I’ve attached the input_neuron code , there are two of them and essentially the same:

 input_neurons_spike_resetter_codeobject_1.cpp
 input_neurons_spike_resetter_codeobject.cpp

input_neurons_spike_resetter_codeobject.cpp (6.7 KB)
input_neurons_spike_resetter_codeobject_1.cpp (6.7 KB)

mstimberg · 10 December 2024 14:19

Hi @DavidKing2020, could you please share the generated main.cpp file as well? This is where the apply_run_args call would make a difference.

DavidKing2020 · 10 December 2024 14:46

Hi @mstimberg , here it is attached. Thanks
main.cpp (15.1 KB)

mstimberg · 10 December 2024 15:02

Hmm, that looks correct, assuming that code_objects/input_neurons_group_variable_set_conditional_codeobject_2.h sets the next_spike_timestep variable as expected.

Could you maybe present a single sample for 0*ms (i.e., it will only do the initializations and nothing else), and then print out the values of input_neurons.rate and input_neurons.next_spike_timestep to see whether they are as expected?

DavidKing2020 · 10 December 2024 15:28

I tried to run 0ms and 10ms. The outputs are attached. The “input_neurons_group_variable_set_conditional_codeobject_2…cpp” are also attached. When running time is 0ms, the next_spike_time look correct but when runtime = 10ms, the neurons which should not spike has lots of 99 values. These 99 value change to larger value with longer running time. Looks like a initialization issue. Thanks
input_neurons_group_variable_set_conditional_codeobject_2.cpp (6.0 KB)
out_runTime_10ms.txt (12.5 KB)
out_runTime_0ms.txt (12.0 KB)
input_neurons_group_variable_set_conditional_codeobject_2.cpp (6.0 KB)

mstimberg · 10 December 2024 15:36

Oh sorry, I just realize something: in your case you have neurons that have rate 0, right? The current formulation does not work for these neurons…
Could you try replacing

    brian2.device.apply_run_args()
    self.input_neurons.next_spike_timestep = "timestep(-log(rand())/rate, dt)"

by

    self.input_neurons.next_spike_timestep = 1_000_000
    brian2.device.apply_run_args()
    self.input_neurons.next_spike_timestep['rate>0*Hz'] = "timestep(-log(rand())/rate, dt)"

The 1_000_000 value for the neurons that have rate 0 is arbitrary – it should just be a value bigger than the number of timesteps in a single run, so it never gets triggered.

mstimberg · 10 December 2024 15:37

I think with the previous formulation, neurons that should never spike (rate=0Hz), spiked at every timestep…

DavidKing2020 · 10 December 2024 15:54

The issue for “spike at every timestep” is gone :smiley, but the overall running time is similar as the the orignal ‘rand()<rate*dt’ method. The reason is from “s_input2e0_post_codeobject” as shown in the following profile:

 s_input2e0_post_codeobject                    10.75 ms    45.14 %
 input_neurons_spike_thresholder_codeobject     2.64 ms    11.10 %
 s_input2e0_pre_plastic_codeobject              1.61 ms     6.77 %
 exci_neurons0_stateupdater_codeobject          1.37 ms     5.77 %
 s_input2e0_pre_plastic_push_spikes             1.28 ms     5.37 %

The “s_input2e0_post_codeobject .cpp” is attached. With the ‘rand()<rate*dt’ method, “s_input2e0_post_codeobject .cpp” didn’t take as much time. The code is attached. Thanks
s_input2e0_post_codeobject.cpp (12.3 KB)

Topic		Replies	Views
Time dependent rate for PoissinGroup Support	2	331	17 December 2020
Changing stimuli programmatically in C++ standalone Showcase	17	577	27 June 2023
`PoissonGroup` vs. `PoissonInput` and their parameters Support synapses , poisson	3	222	15 April 2024
How to change poisson rates every 30ms for example? Support brian2genn	3	524	15 January 2021
Re-run the model in a previous time segment during a long run Support	4	552	18 February 2021

Possibility of speeding up Poisson spike generation

Related topics