Delayed summed variable : advice for performance in runtime mode

Continuing the discussion from Delay for 'summed' variables in synapses:

Dear all,

I am a new Brian2 (version 2.5.0.1) user, currently working on a model using delayed summed variables.
Indeed, at each time step t of the simulation, to compute the summed variable, the Synapses object needs to access the value of a specific variable of the input NeuronGroup object at time t-d_ij, with d_ij the delay between the input neuron i and the synapse between neurons i & j.

I implemented a NumPy version, using a buffer to keep values in memory for several time steps, and using .run_regularly both for the update of this buffer, and for fetching the correct delayed variables for the synapses. My implementation was inspired by this former solution proposed by @mstimberg : Example of implementing a buffer for a delayed variable in Brian2 · GitHub
Below are the most relevant parts of it (simplified to keep only the elements related to the issue):

  1. The part where the input NeuronGroup is defined, with a function to update the buffer of variable v at each time step:
    G = NeuronGroup(nb_neurons, '''dv/dt = 0.4/ms : 1               
                           buffer_pointer : integer (shared)
                       ''', method='euler',   threshold='v > 4.0', reset='v = 0.0')

    buffer_size = 1+ max(delays)
    G.variables.add_array('v_buffer', size=(buffer_size, len(G)))
    update_code = '''buffer_pointer = (buffer_pointer + 1) % buffer_size
                     return_ = update_v_buffer(v, v_buffer, buffer_pointer)'''
    buffer_updater = G.run_regularly(update_code, codeobj_class=NumpyCodeObject)

    @implementation('numpy', discard_units=True)
    @check_units(v=1, v_buffer=1, buffer_pointer=1, result = 1) # the units
    def update_v_buffer(v, v_buffer, buffer_pointer):
        '''
        A buffer updated at each time step keeps v(t) for all input neurons of G for a temporal time window defined by the highest delay available
        :param v:
        :param v_buffer:
        :param buffer_pointer:
        :return:
        '''
        v_buffer[buffer_pointer, :] = v
        return 0.0
  1. The part where the Synapses object is defined, with a function to fetch the delayed values of variable v at each time step:
    S = Synapses(G,H, '''
    w : 1
    v_delayed : 1 
    delay_step : integer
    I_post = v_delayed*w : 1 (summed)
    ''', method='euler')
    S.connect()
    S.delay_step = delays

    S.variables.add_reference('v_buffer_from_synapse', G, varname='v_buffer', index=None)
    S.variables.add_reference('buffer_pointer_from_synapse', G, varname='buffer_pointer', index=None)
    update_code_syn ='''
                     v_delayed = get_v_delayed(v_buffer_from_synapse, buffer_pointer_from_synapse, buffer_size, i, delay_step)
                     '''
    get_new_v_delayed = S.run_regularly(update_code_syn, codeobj_class=NumpyCodeObject)

    
    @implementation('numpy', discard_units=True)
    @check_units(v_buffer_from_synapse=1, buffer_pointer_from_synapse=1, buffer_size=1, i=1, delay_step=1, result=1)
    def get_v_delayed(v_buffer_from_synapse, buffer_pointer_from_synapse, buffer_size, i, delay_step):
        return v_buffer_from_synapse[(buffer_pointer_from_synapse - delay_step) % buffer_size, i]

This works as expected, but I was wondering if anyone had tips to make it faster.
Indeed, it is time-consuming, as it can be seen thanks to profiling:

Profiling summary
=================
synapses_run_regularly             92.75 1.0    91.06 %
synapses_summed_variable_I_post     7.85 1.0     7.70 %
neurongroup_run_regularly           0.59 1.0     0.58 %
neurongroup_stateupdater            0.25 1.0     0.24 %
neurongroup_thresholder             0.15 1.0     0.15 %

The update of the buffer (neurongroup_run_regularly) is ok, but the time spent to fetch the delayed variables for the summed variables (synapses_run_regularly) is quite high (more than 90% of the overall simulation time).
One puzzling fact about synapses_run_regularly is that it is obviously time-consuming, even when the number of synapses is reduced : I launched the simulation with only autapses ( S.connect(j=‘i’)), thus with as many synapses than neurons in the input NeuronGroup, but the computation time was still far higher than the one of neurongroup_run_regularly.

synapses_run_regularly is currently my bottleneck, thus I would like to improve it as much as possible, to use this model with a higher number of neurons.

The question of this topic: How could this bottleneck be reduced, as efficiently as possible (from the software side) ?

I already thought & tested a few things, but I got no success so far (and I am open to new ideas!):

  • using the @network_operation() variant proposed in the shared topic instead of .run_regularly : I got equivalent / slightly worse results

  • Try to use a cython implementation instead of the NumPy version: it did not work, as the 2D array (the buffer) appears to not be accepted as an input…

  • Try a ‘cpp’ implementation: the possible problem is that I will use the model in a global simulation where @network_operation() will be used, for interaction with an environment. Thus, it appears that, for the sake of the simulation, the standalone mode is not usable. Is a ‘cpp’ implementation only usable when the device is set to ‘cpp_standalone’, or is it possible to use a cpp implementation just for synapses_run_regularly, while staying in runtime mode?

  • Try to use Brian2GeNN : I would rather not use it, because of the unsupported features (that are included in my simulation) :
    Unsupported features in Brian2GeNN — Brian2GeNN 1.5 documentation

  • Try to use frameworks to boost NumPy code, like Numba:

It appears there was a project to make Numba compatible with Brian several years ago (GitHub - brian-team/brian2numba: Numba code generation backend for Brian 2), but in my current Brian2 version NumbaCodeObject is not defined, thus it is not usable in .run_regularly() .

Another possibility would be to directly import Numba in a script that is also using Brian2, like below (but I am not sure that it is a good idea, as it might lead to issues / conflicts ? ) :

    from brian2 import *
    from numba import njit

    @njit
    @implementation('numpy', discard_units=True)
    @check_units(v_buffer_from_synapse=1, buffer_pointer_from_synapse=1, buffer_size=1, i=1, delay_step=1, result=1) 
    def get_v_delayed(v_buffer_from_synapse, buffer_pointer_from_synapse, buffer_size, i, delay_step):
        return v_buffer_from_synapse[(buffer_pointer_from_synapse - delay_step) % buffer_size, i]

Anyway, I get the following error :

Traceback (most recent call last):
  File "/home/.../.py", line 128, in Simulation
    def get_v_delayed(v_buffer_from_synapse, buffer_pointer_from_synapse, buffer_size, i, delay_step):
  File "/home/.../.local/lib/python3.8/site-packages/numba/core/decorators.py", line 258, in njit
    return jit(*args, **kws)
  File "/home/.../.local/lib/python3.8/site-packages/numba/core/decorators.py", line 179, in jit
    return wrapper(pyfunc)
  File "/home/.../.local/lib/python3.8/site-packages/numba/core/decorators.py", line 198, in wrapper
    raise TypeError(
TypeError: The decorated object is not a function (got type <class 'brian2.core.functions.Function'>).

Process finished with exit code 1

Making the decorators in another order before the function produces other errors:

    @implementation('numpy', discard_units=True)
    @check_units(v_buffer_from_synapse=1, buffer_pointer_from_synapse=1, buffer_size=1, i=1, delay_step=1, result=1) 
    @njit
    def get_v_delayed(v_buffer_from_synapse, buffer_pointer_from_synapse, buffer_size, i, delay_step):
 File "/home/.../.py", line 128, in Simulation
    def get_v_delayed(v_buffer_from_synapse, buffer_pointer_from_synapse, buffer_size, i, delay_step):
  File "/home/.../.local/lib/python3.8/site-packages/brian2/core/functions.py", line 559, in do_user_implementation
    function.implementations.add_numpy_implementation(wrapped_func=func,
  File "/home/.../.local/lib/python3.8/site-packages/brian2/core/functions.py", line 376, in add_numpy_implementation
    new_globals = dict(orig_func.__globals__)
AttributeError: 'CPUDispatcher' object has no attribute '__globals__'

Process finished with exit code 1

Is there a way to use decorators from both Brian2 & Numba on the same function? Or is it impossible / too prone to errors?

Thank you very much for your attention!

Here is the link of my implementation of the delayed summed variable, in case it can be of any use : GitHub - Gaerdil/Brian2_DelayedSummedVariable

p.s : This topic is about advice to improve an implementation allowing a feature that was requested in the linked topic. I am not sure if it entirely belongs to the “Support” category, but I don’t know to which other category it could belong to…

Hi @Gaerdil,

Here is a crazy idea. We can create before simulation a C file, which defines a static table for delays (just static double delay_table[n-steps][n-variables], where n-steps is the maximal delay in the system divided by dt, and n-variables is the total number of delayed variables) and a few other variables. We can declare a procedure to write/update the current state of all variables and a function to read lagged variables.

After that, we can import these functions into Cython or use them in cpp; it does not matter anymore. This should be a pretty dirty but fast implementation. The code may be very similar to lookup tables for brian2.

Hi @Gaerdil , hi @rth I still hope that one day we’ll have a simpler implementation, but for now the user-defined functions are all we have :grimacing:

Your code looks right to me, you cannot do much better in the general case with a Python implementation. You could improve the implementation for special cases like full connectivity or one-to-one connectivity, but if I understood correctly, this is not what you are after. Regarding your observation that synapses_run_regularly remains slow for the one-to-one connectivity: the reason is that the code still uses the indexing with the array of indices i (in numpy-speak: fancy indexing) which is much slower than using : for the second dimension as you do in the neurongroup_run_regularly operation – in this special case the two are equivalent, though. For your Python implementation I only see one optimization: you do not actually need to use ... % buffer_size, because the negative indices that appear after subtracting delay_step do the same thing, numpy will take care of this.

But more interestingly, it is actually possible to do this with Cython as well. You are right that it cannot work with the 2D array directly, but one workaround is to use a 1D array instead (in memory, this is how a 2D array is represented anyhow). To juggle a bit less around with references and so on, we can also define the buffer outside of the NeuronGroup, and refer to it as a kind of global variable from the functions using the namespace feature (see Functions — Brian 2 2.5.4 documentation). If they are declared in the namespace dictionary, the code can refer to them as _namespace<varname> (which unfortunately needs to ugly names, e.g. v_buffer becomes _namespacev_buffer). For the update of the buffer, we use the same function to update the buffer pointer, even though this again a bit hacky since the Cython function gets called for every neuron and not once for the array of neurons as in Cython, so we have to avoid that the pointer gets updated repeatedly (we could also keep the previous approach with the buffer pointer as a model variable and an argument to the function, but I found it slightly clearer to have both the buffer and the buffer pointer as “global” variables).
So the NeuronGroup code to update the buffer would look like this (we could make the function work for Python as well, but it did not seem to be worth the effort):

    # External variables accessed by the two functions
    buffer_pointer = np.zeros(1, dtype=int)
    v_buffer = np.empty(buffer_size * len(G))  # 2D as 1D
    
    update_code = 'return_ = update_v_buffer(v, i)'  # needs the index of each neuron
    buffer_updater = G.run_regularly(update_code)

    @implementation('cython', f'''
    cdef double update_v_buffer(double v, int i):
        if i == 0:
            _namespacebuffer_pointer[0] = (_namespacebuffer_pointer[0] + 1) % {buffer_size}
        _namespacev_buffer[_namespacebuffer_pointer[0] * {len(G)} + i] = v
        return 0.0  # dummy return
    ''', namespace={'v_buffer': v_buffer,
                    'buffer_pointer': buffer_pointer})
    @check_units(v=1, i=1, result = 1) # the units
    def update_v_buffer(v, i):
        raise NotImplementedError('Use Cython')

Two more remarks here:

  1. The buffer_pointer has to be a single element array, to easily be able to overwrite the value from within Cython
  2. The size of the group and the size of the buffer are hardcoded into the Cython code using f-string formatting.

Implementing the get_v_delayed function for the synapse is quite straightforward and follows the same approach:

    @implementation('cython', 
    f'''
    cdef double get_v_delayed(int i, int delay_step):
        return _namespacev_buffer[((_namespacebuffer_pointer[0] - delay_step) % {buffer_size})*{len(G)} + i]
    ''', namespace={'v_buffer': v_buffer, 'buffer_pointer': buffer_pointer})
    @check_units(i=1, delay_step=1, result=1)
    def get_v_delayed(i, delay_step):
        raise NotImplementedError('Use Cython')

We actually do not need to store the delayed values in a v_buffer variable that gets updated via a run_regularly operation, instead we can directly refer to the function as part of the summed variable:

    S = Synapses(G,H, '''
    w : 1 (constant)
    delay_step : integer (constant)
    I_post = get_v_delayed(i, delay_step)*w : 1 (summed)
    ''', method='euler')

From my quick and dirty testing, this all seems to work and gives the same result as the Python implementation. Full code below (click to expand):

Full code example
def Simulation(nb_neurons, simulation_time):
    print("======  simulation ====== ")
    defaultclock.dt = 1*ms
    start = time.time()
    print(">>> number of neurons: "+str(nb_neurons)+", simulation time: "+str(simulation_time)+", time resolution: "+str(defaultclock.dt))

    #===============> 1)  prepare the synapse delays
    delay_max= 10
    delays = [randint(1,delay_max) for i in range(nb_neurons**2)]


    # ===============> 2)  prepare the Input group & the buffer to keep variable v at previous time steps (up to the highest delay)
    G = NeuronGroup(nb_neurons, '''dv/dt = 0.4/ms : 1               
                       ''', method='euler',   threshold='v > 4.0', reset='v = 0.0')
    buffer_size = 1+ max(delays)
    # External variables accessed by the two functions
    buffer_pointer = np.zeros(1, dtype=int)
    v_buffer = np.empty(buffer_size * len(G))
    
    update_code = 'return_ = update_v_buffer(v, i)'
    buffer_updater = G.run_regularly(update_code)

    @implementation('cython', f'''
    cdef double update_v_buffer(double v, int i):
        if i == 0:
            _namespacebuffer_pointer[0] = (_namespacebuffer_pointer[0] + 1) % {buffer_size}  # hardcoded buffer size
        _namespacev_buffer[_namespacebuffer_pointer[0] * {len(G)} + i] = v
        return 0.0  # dummy return
    ''', namespace={'v_buffer': v_buffer, 'buffer_pointer': buffer_pointer})
    @check_units(v=1, i=1, result = 1) # the units
    def update_v_buffer(v, i):
        raise NotImplementedError('Use Cython')


    # ===============> 3)  prepare the Output group
    H = NeuronGroup(nb_neurons, '''I : 1
                           ''', method='euler')


    # ===============> 4)  prepare the Synapses & how to fetch delayed values of v from input group
    S = Synapses(G,H, '''
    w : 1 (constant)
    delay_step : integer (constant)
    I_post = get_v_delayed(i, delay_step)*w : 1 (summed)
    ''', method='euler')
    S.connect()
    S.w = 1
    S.delay_step = delays


    @implementation('cython', 
    f'''
    cdef double get_v_delayed(int i, int delay_step):
        return _namespacev_buffer[((_namespacebuffer_pointer[0] - delay_step) % {buffer_size})*{len(G)} + i]
    ''', namespace={'v_buffer': v_buffer, 'buffer_pointer': buffer_pointer})
    @check_units(i=1, delay_step=1, result=1)

    def get_v_delayed(i, delay_step):
        raise NotImplementedError('Use Cython')


    # ===============> 5)  Run the simulation
    build_time = time.time()
    run(simulation_time, report_period=1 * second, profile=True)

    end = time.time()
    print("\n => Total elapsed time: ", datetime.timedelta(seconds=end - start))
    print(" ==> DETAILS: network build time: ", datetime.timedelta(seconds=build_time - start),
          " ; simulation time : ", datetime.timedelta(seconds=end - build_time))
    print(profiling_summary(show=5))


if __name__ == '__main__':
    nb_neurons, simulation_time = 600, 1000*ms
    Simulation(nb_neurons, simulation_time)

Yes, currently C++ code can only be provided for C++ standalone mode. In the long run, we will probably use C++ directly for both runtime and standalone mode and get rid of Cython completely…

Brian2GeNN has a subset of the functionality of C++ standalone mode. If you can’t use C++ standalone mode (e.g. because you need network_operation), you definitely cannot use Brian2GeNN.

Hi @mstimberg, hi @rth,

Thank you very much for your answers. It helps a lot!

@rth this is a very interesting idea! I will look into it.

@mstimberg your cython version is very efficient (in comparison with the NumPy version), it will make me save a lot of time!

Thank you :slight_smile:

1 Like

Hi @mstimberg, hi @Gaerdil, (and pretty sure) hi @adam

I had a bit of time today, while my big project is running a very long simulation, to implement what we have been discussing for almost a month here.

The code on githug and it does NOT work . It’s just a “first draft”. “template”, “starting point”. It needs to be heavily debugged in all 3 computational modes: numpt, cython, and cpp. I will try to sketch test suits for cython and cpp tonight, but I probably won’t have time for debugging this week (the simulation should finish tonight).

Please feel free to fork it, play with it, and submit patches if you can make it work.

Have fun :nerd_face: !

Although the numpy version seems works and all variables lagged in right order and in right delay, the better test is needed for sure.
Figure_1

I also implemented the same silly test for cython and cpp, but both returns an error:

cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
WARNING    Cannot use Cython, a test compilation failed: command '/usr/bin/gcc' failed with exit code 1 (CompileError) [brian2.codegen.runtime.cython_rt.cython_rt.failed_compile_test]
INFO       Cannot use compiled code, falling back to the numpy code generation target. Note that this will likely be slower than using compiled code. Set the code generation to numpy manually to avoid this message:
prefs.codegen.target = "numpy" [brian2.devices.device.codegen_fallback]

any ideas ?

This seems to be the same issue as in Problem with cython using Ubuntu 20.04, not quite sure what is going on. Some mismatch between the compiler that is used and the compiler flags that are set, e.g. via a conda environment?

I ran it with debug on. Doesn’t seem, it generates more clear output.

DISTUTILS_DEBUG=1 python simple-c-standalone.py 
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
make: *** [<builtin>: brianlib/randomkit/randomkit.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [makefile:31: code_objects/neurongroup_run_regularly_codeobject.o] Error 1
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
make: *** [makefile:31: code_objects/neurongroup_thresholder_codeobject.o] Error 1
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
make: *** [makefile:31: code_objects/neurongroup_stateupdater_codeobject.o] Error 1
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
make: *** [makefile:31: code_objects/statemonitor_codeobject.o] Error 1
make: *** [makefile:31: code_objects/neurongroup_resetter_codeobject.o] Error 1
make: *** [makefile:31: code_objects/neurongroup_group_variable_set_conditional_codeobject.o] Error 1
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
cc1plus: error: bad value (‘tigerlake’) for ‘-march=’ switch
cc1plus: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native
cc1plus: error: bad value (‘tigerlake’) for ‘-mtune=’ switch
cc1plus: note: valid arguments to ‘-mtune=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native
make: *** [makefile:31: code_objects/synapses_summed_variable_S_post_codeobject.o] Error 1
make: *** [makefile:31: code_objects/synapses_synapses_create_array_codeobject.o] Error 1
make: *** [makefile:31: main.o] Error 1
ERROR      Brian 2 encountered an unexpected error. If you think this is a bug in Brian 2, please report this issue either to the discourse forum at <http://brian.discourse.group/>, or to the issue tracker at <https://github.com/brian-team/brian2/issues>. Please include this file with debug information in your report: /tmp/brian_debug_nzlokgxp.log  Additionally, you can also include a copy of the script that was run, available at: /tmp/brian_script_fcveirb6.py You can also include a copy of the redirected std stream outputs, available at /tmp/brian_stdout__l6utyaj.log and /tmp/brian_stderr_ncllupn6.log Thanks! [brian2]
Traceback (most recent call last):
  File "/media/rth/rth-master/rth/projects/dcv4brian/tests/simple-c-standalone.py", line 44, in <module>
    run(2000*ms)
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/units/fundamentalunits.py", line 2434, in new_f
    result = f(*args, **kwds)
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/core/magic.py", line 373, in run
    return magic_network.run(duration, report=report, report_period=report_period,
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/core/magic.py", line 231, in run
    Network.run(self, duration, report=report, report_period=report_period,
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/core/base.py", line 276, in device_override_decorated_function
    return getattr(curdev, name)(*args, **kwds)
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/devices/cpp_standalone/device.py", line 1511, in network_run
    self.build(direct_call=False, **self.build_options)
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/devices/cpp_standalone/device.py", line 1254, in build
    self.compile_source(directory, compiler, debug, clean)
  File "/home/rth/.local/lib/python3.9/site-packages/brian2/devices/cpp_standalone/device.py", line 990, in compile_source
    raise RuntimeError(error_message)
RuntimeError: Project compilation failed (error code: 512). Consider running with "clean=True" to force a complete rebuild.

The DISTUTILS_DEBUG option is meant for Cython, for C++ standalone you can use set_device('cpp_standalone', debug=True). Either way, I strongly doubt this is a Brian issue, since the -mtune=tigerlake option comes from somewhere else. Can you check your CFLAGS and CXXFLAGS? These are for example set in conda environments with gcc installed (but you do not seem to be using conda). What I suspect is happening: something sets the CFLAGS/CXXFLAGS environment variables to values appropriate for gcc versions >= 10 (i.e. with stuff like -mtune=tigerlake), but your compilation uses a gcc version <= 9 which does not support these options.

Thank you, @mstimberg, for this suggestion. I tuned to gcc-10, and this error is gone. Now I can run both cython and standalone versions, which is excellent, but my delay system doesn’t work. After a bit of debugging, I got stuck with brian internal implementations.

So, please, take a look:
First, dcvinit function creates a c-file dcv4brian.c with pretty simple set of functions and variables

#ifndef __DCV4BRIAN__
#define __DCV4BRIAN__
const int    dcvDsize  = 1905;
const int    dcvVsize  = 4;
int          dcvIndex  = 0;
double dcvtbl[1905][4] = {
   {-60,-60,-60,-60},
   {-60,-60,-60,-60},
.......
   {-60,-60,-60,-60},
   {-60,-60,-60,-60}
};

void _dcvupdate(double x, int i){
    if ( i == 0 ){
        dcvIndex = (dcvIndex == 0)?(dcvDsize-1):(dcvIndex-1);
    }
    //DB>>
    printf("%d:%d:%0.2f>",i,dcvIndex,dcvtbl[dcvIndex][i]);
    //<<DB
    dcvtbl[dcvIndex][i] = x;
    //DB>>
    printf("%0.2f>%0.2f  ",x,dcvtbl[dcvIndex][i]);
    if ( i == 3 ) printf("\n");
    //<<DB
    return ;
}
double _dcvget(int dly, int varid){
    //DB>>
    printf("%d : (%03d+%03d)%% %03d=%03d = %0.2f \n",
        varid,
        dly,
        dcvIndex,
        dcvDsize,
        (dcvIndex+dly)%dcvDsize,dcvtbl[(dcvIndex+dly)%dcvDsize][varid]);
    //<<DB
    return dcvtbl[(dcvIndex+dly)%dcvDsize][varid] ;
}

#endif

Note that DB>> and <<DB indicate a debug section, just to check what is going on.

Next in the module we have two functions with three different implementations.
That are dcvupdate to call every simulation step:

@implementation('cpp', '''
#include <dcv4brian.c>;
int dcvupdate(double x, int i){
    _dcvupdate(x,i);
    return 0;
}
''',include_dirs=[os.getcwd()])
@implementation('cython',f'''
cdef extern from "{os.getcwd()}/dcv4brian.c":
    void   _dcvupdate(double, int)
cdef int dcvupdate(double x, int i):
    _dcvupdate(x,i)
    return 0
''',include_dirs=[os.getcwd()])
@check_units(arg=[1],result=1)
def dcvupdate(x,i):
    """
    dcvupdate shifts index and records current condition of dynamic variables
    given by `x`. To use this numpy realization you need initialize delayed continues variable
    `dcvinit` with argument c_target = False
    Inputs:
        x - a value
        i - an index
    
    """
    
    dcvinit.dcvIndex = (dcvinit.dcvDsize-1) if dcvinit.dcvIndex == 0 else (dcvinit.dcvIndex-1)
    dcvinit.dcvtbl[dcvinit.dcvIndex,i] = copy(x)
    return 0

and dcvget to read delayed variables:

@implementation('cpp', '''
#include <dcv4brian.c>;
double dcvget(int dly, int varid){
    return _dcvget(dly, varid);
}
''',include_dirs=[os.getcwd()])
@implementation('cython',f'''
cdef extern from "{os.getcwd()}/dcv4brian.c":
    double _dcvget(int, int)
cdef double dcvget(int dly, int varid):
    return _dcvget(dly, varid)
''',include_dirs=[os.getcwd()])
@check_units(arg=[1,1],result=1)
def dcvget(dly,varid):
    """
    `dcvget` returns a value of continues variable defined by variable ID (`varid`)
    `dly` steps ago.
    """
    return dcvinit.dcvtbl[(dcvinit.dcvIndex+dly)%dcvinit.dcvDsize,varid]

Now we can see what is happening inside python simple-c-standalone.py >simple-c-standalone.debug
Here are the first 2 steps of a simulations

0:1904:-60.00>-60.00>-60.00  1:1904:-60.00>-60.00>-60.00  2:1904:-60.00>-60.00>-60.00  3:1904:-60.00>-60.00>-60.00  
0 : (1000+000)% 1905=1000 = -60.00 
1 : (1300+000)% 1905=1300 = -60.00 
2 : (1600+000)% 1905=1600 = -60.00 
3 : (1900+000)% 1905=1900 = -60.00 
0:1903:-60.00>-59.92>-59.92  1:1903:-60.00>-59.98>-59.98  2:1903:-60.00>-59.98>-59.98  3:1903:-60.00>-59.98>-59.98  
0 : (1000+000)% 1905=1000 = -60.00 
1 : (1300+000)% 1905=1300 = -60.00 
2 : (1600+000)% 1905=1600 = -60.00 

the first line is from the update, and we can see that it already decremented dcvIndex and set it to the end of the buffer 1904. However, the following 3 lines are reports from dcvget, and here we can see why it doesn’t work

0 : (1000+000)% 1905=1000 = -60.00  
^ variable ID
      ^^^^ delay shift
         ^^^ dcvIndex - it's still zero, while it should be 1940!
                 ^^^^ dcvDsize
                     ^^^^ actial address
                             ^^^^^^ a value there

Finally on the 4th line is dcvupdate, where we can check that dcvupdate have access to dcvIndex, and decremented it successfully.

0:1903:-60.00>-59.92>-59.92  1:1903:-60.00>-59.98>-59.98  2:1903:-60.00>-59.98>-59.98  3:1903:-60.00>-59.98>-59.98  
  ^^^^ - current dcvIndex

It seems there are two dcvIndex and potentially two dcvtbl variables, so dcvupdate and dcvget work with different set of presumably global variables. Any workaround?

Yes, it cannot work like that. The individual “code objects” (summed variable update, run_regularly function, etc.) are compiled independently, so each will include its own copy of dcv4brian.c. In Cython, the compiled object files are always used independently, so the two “global” variables never interact. In C++, the situation is slightly different, as all the object files are linked into a single binary. To avoid clashes between functions defined in support code for different objects, the support code is encapsulated in an anonymous namespace which limits its scope to the file. It would actually be possible to take the dcv4brian.c out of this namespace by not manually writing an #include as part of the code but instead use the headers=... feature of @implementation. But as I said earlier: the files are compiled individually, which means that this would only lead to a linking error (because there are two definitions of the table data structure). I guess for C++ standalone you could somehow work around this by only including the declaration in the header file, and e.g. dynamically allocating memory for it. But this is not possible for Cython. The only way that works for both Cython and C++ standalone is to use the namespace feature that I showed earlier. Each Cython module then gets a pointer to the same data in memory (giving Cython access to memory shared with Python/numpy is the main reason why we are using Cython in the first place). For C++ standalone, we write the Python array to disk, and generate code at the beginning of the standalone code that creates a C array of the correct size and fills it with the values from disk. All the code objects referring to this array get a pointer to it, so they are all accessing/updating the same values. Here’s a basic script that demonstrates the use of shared memory from two separate run_regularly code objects. It works with numpy, Cython, and C++ standalone:

from brian2 import *
# Uses Cython by default, to switch to numpy or C++ standalone use one of:
# prefs.codegen.target = 'numpy'
# set_device('cpp_standalone')

values = np.arange(10)*2.

@implementation('cpp','''
double get_value(int index) {
  return _namespacetable[index];
}''', namespace={'table': values})
@implementation('cython','''
cdef double get_value(int index):
  return _namespacetable[index]
''', namespace={'table': values})
@check_units(index=1, result=1)
def get_value(index):
  return values[index]

@implementation('cpp','''
double update_value(int index, double value) {
  _namespacetable[index] += value;
  return 0.;
}''', namespace={'table': values})
@implementation('cython','''
cdef double update_value(int index, double value):
  _namespacetable[index] += value
  return 0.
''', namespace={'table': values})
@check_units(index=1, value=1, result=1)
def update_value(index, value):
  values[index] += value
  return 0.

G = NeuronGroup(10, 'v : 1')
G.run_regularly('v += get_value(i)')  # runs first
G.run_regularly('dummy = update_value(i, rand())', order=1)

run(2*defaultclock.dt)
print(G.v)

Hi all,

I think there is a good workaround for this issue, in this particular case. @mstimberg, it seems we don’t need a second function, if we can write variable current state and read delayed variable by a single call.
This, slightly different approach, uses only one function during simulation, namely dcvsetget, which records the current variable state into the table and retrieves the delayed state of the same variable.
Here how it works

from numpy import *
from brian2 import *
from dcv4brian import *


defaultclock.dt=0.1*ms
## if numpy is a target
prefs.codegen.target = 'numpy'
## if target is cython
# prefs.codegen.target = 'cython'
## if this is a CPP-standalone simulations
# set_device('cpp_standalone', directory='standalone')


taum = 20*ms
taue = 5*ms
Vt = -50
Vr = -60
El = -55



eqs = '''
dv/dt  = (I + int(i==0)*int(t<40*ms)*10 -(v-El))/taum : 1 (unless refractory)
dI/dt  = (w/(1+exp(-(S-Vt))) - I)/taue     : 1 (unless refractory)
S : 1
w : 1
'''

N = NeuronGroup(4, eqs, threshold='v>Vt', reset='v = Vr; I = 0', refractory=1*ms, method='euler')
N.v = 'Vr'
N.w   = 50.
N.S = 'Vr'


W = Synapses(N, N, """
    dly                                   : integer (constant) # delay
    S_post = dcvsetget(i,v_pre,dly,t/ms)  : 1  (summed)
""")
xdly  = array([ int(round((100+i*30)*ms/defaultclock.dt)) for i in range(4) ])
W.connect(i=[0,1,2,3],j=[1,2,3,0])
W.dly = xdly

M = StateMonitor(N, ['v','I','S'], record=True)
## If Numpy >>
dcvinit(amax(xdly).astype(int)+5,4,defaultclock.dt/ms,array([Vr,Vr,Vr,Vr]),c_target=False)
## If Cython and CPP-sandalone >>
# dcvinit(amax(xdly).astype(int)+5,4,defaultclock.dt/ms,array([Vr,Vr,Vr,Vr]),c_target=True)


run(2000*ms)

figure(1,figsize=(16,12))

subplot(4,1,1)
plot(M.t/ms,M[0].v,"-",lw=3,label="neuron-0")
plot(M.t/ms,M[1].S,"-",lw=3,label="delayed in 1")
legend(loc=1,fontsize=16)
subplot(4,1,2)
plot(M.t/ms,M[1].v,"-",lw=3,label="neuron-1")
plot(M.t/ms,M[2].S,"-",lw=3,label="delayed in 2")
legend(loc=1,fontsize=16)
subplot(4,1,3)
plot(M.t/ms,M[2].v,"-",lw=3,label="neuron-2")
plot(M.t/ms,M[3].S,"-",lw=3,label="delayed in 3")
legend(loc=1,fontsize=16)
subplot(4,1,4)
plot(M.t/ms,M[3].v,"-",lw=3,label="neuron-3")
plot(M.t/ms,M[0].S,"-",lw=3,label="delayed in 0")
legend(loc=1,fontsize=16)

show()

Under the hood, it uses a pretty simple module (the code is also on github)

from numpy import *
from brian2 import *
import os

protofunction='''
double _dcvsetget(int varid, double varval, int dly, double t){
    if ( fabs(t-dcvTime-dcvdt) < 1e-9 ){
        dcvIndex = (dcvIndex == 0)?(dcvDsize-1):(dcvIndex-1);
        dcvTime  = t;
    }
    dcvtbl[dcvIndex][varid] = varval;
    return dcvtbl[(dcvIndex+dly)%dcvDsize][varid] ;
}
'''

def dcvinit(tblsize :int,nvar:int,simdt:float,init:ndarray,c_target:bool=True):
    """
    It generates dcv4brian.c file with all required static variables
    for cpp and cython targets.
    Parameters:
        tblsize  - int     - max number of steps to remember
        nvar     - int     - number of delayed continues variables (DCV)
        simdt    - float   - simulation time step
        init     - ndarray - 1D array with initial value for each DCV
                           - this values fill up the table, therefore
                           - if a variable is a membrane potential of a 
                           - neurons, initialize it at resting potential
                           - if a variable is current or synaptic conductance
                           - initialize it with zero.
        c_only   - bool    - Set yo False if you need numpy code
    """
    dcvinit.c_target = c_target
    if c_target:
        with open("dcv4brian.c","w") as fd:
            fd.write("#ifndef __DCV4BRIAN__\n")
            fd.write("#define __DCV4BRIAN__\n")
            fd.write("#include <math.h>\n")
            fd.write(f"const  int    dcvDsize  = {tblsize:d};\n")
            fd.write(f"const  int    dcvVsize  = {nvar:d};\n")
            fd.write(f"static int    dcvIndex  = 0;\n")
            fd.write(f"static double dcvdt     = {simdt};\n")
            fd.write(f"static double dcvTime   = 0;\n")
            fd.write(f"static double dcvtbl[{tblsize:d}][{nvar:d}] = {{"+"\n")
            for cid in range(tblsize):
                fd.write("   {")
                for x in range(nvar):
                    fd.write(f"{init[x]}"+("," if x != nvar-1 else "}") )
                fd.write(",\n" if cid != tblsize-1 else "\n")
            fd.write("};\n")
            fd.write(protofunction+"\n")
            fd.write("#endif\n")
        dcvinit.dcvtbl   = None
        dcvinit.dcvIndex = 0
        dcvinit.dcvTime  = 0.
        dcvinit.dcvdt    = simdt
        dcvinit.dcvDsize = tblsize
        dcvinit.dcvVsize = nvar
            
    else:
        
        dcvinit.dcvtbl   = zeros((tblsize,nvar))
        for n in range(tblsize): dcvinit.dcvtbl[n] = copy(init)
        dcvinit.dcvIndex = 0
        dcvinit.dcvTime  = 0.
        dcvinit.dcvdt    = simdt
        dcvinit.dcvDsize = tblsize
        dcvinit.dcvVsize = nvar
    
    

@implementation('cpp', '''
#include <dcv4brian.c>;
double dcvsetget(int varid, double varval, int dly, double t){
    return _dcvsetget(varid,varval,dly,t);
}
''',include_dirs=[os.getcwd()])
@implementation('cython',f'''
cdef extern from "{os.getcwd()}/dcv4brian.c":
    double _dcvsetget(int, double, int, double)
cdef double dcvsetget(int varid, double varval, int dly, double t):
    return _dcvsetget(varid,varval,dly,t);
''',include_dirs=[os.getcwd()])
@check_units(arg=[1,1,1,1],result=1)
def dcvsetget(varid:(int,ndarray), varval:(float,ndarray), dly:int, t:float):
    """
    dcvsetget write current variable value into the table and returns
    a value of the same variable delayed by dly steps
    The module has internal time tracker. If (t - dt - internal_time) is less
    than 1e-9 of time units, the module increases table index.
    Inputs:
        varid - int   - variable ID in the table
        varval- float - current vaiable value to be recorded
        dly   - int   - number of steps to read value in the table
        t     - float - current time
    """
    if abs(t-dcvinit.dcvTime-dcvinit.dcvdt) < 1e-9:
        dcvinit.dcvIndex = (dcvinit.dcvDsize-1) if dcvinit.dcvIndex == 0 else (dcvinit.dcvIndex-1)
        dcvinit.dcvTime = t
    dcvinit.dcvtbl[dcvinit.dcvIndex,varid] = copy(varval)
    return dcvinit.dcvtbl[(dcvinit.dcvIndex+dly)%dcvinit.dcvDsize,varid]

I have tested it in all three implementations.

I haven’t checked performance, written documentation, or posted this module on pypi. You are welcome to contribute to this project.

Update

I daisy-chained (as in the example above) 1000 neurons, and ran them for 20 seconds. Here is performance assessment:

Numpy Cython CPP standalone
With delays 20s 6s 2s
Without delays 14s 3s 1s

Thanks :+1: If there is only one connection per neuron (and therefore only one delay per neuron instead of several different delays for synapses coming from the same neuron), this should be quite a bit more efficient. One minor comment: You can simplify the translation between t and time steps by directly using the integer variable t_in_timesteps instead of t (introduced with Brian 2.2.2.1).

Absolutely! I was quite lazy and just extended my “simple” example, where one connection per neuron was done to check that delays are correct.

Excellent! Thank you. It will make the code much better.

1 Like