Delay in continuous connections

Hi,

I’m looking to simulate a circuit with delay and compare between a spiking model and a rate model.
There’s a simple interface for adding delay to spiking synapses, but I can’t see a straightforward way to do this for continuous connections.

It looks like there have been a few discussions related to this idea:

I’d like to add +1 to the interest in seeing this feature implemented in Brian, and also ask what the current best “workaround” solution for this is

-Adam

1 Like

Hi @adam,

After discussion with @mstimberg, I ended up in a pretty bulky code. This realization of the delay function as NetworkOperation slows down simulations dramatically. I’m not sure whether it is useful to share because I’m really unhappy with the performance. If you come up with something fast and simple, please share.

-RTH

P.S. but if you insist, I can put my code here, of course :slight_smile:

1 Like

Just adding to what @rth said: there’s also the option to implement this in the target language, e.g. C++ for C++ standalone mode as a user-defined function. This is certainly the fastest solution, but also the most difficult to write.
The NetworkOperation solution will be always be relatively slow, but there might be some non-obvious ways to speed things up (using simple floating point values instead of Brian quantities, avoiding array copies, …). If @rth is willing to share a simple example here, I could have a look to see whether there are ways to speed it up.

Hi Marcel, could you point me to info on integrating C++ functions in Brian?
If getting data into and out of C++ is relatively straightforward, and it amounts to writing something like a network operation but in C++, I could probably put that together in a reasonable amount of time

Hi Adam, there’s some basic documentation here: Functions — Brian 2 2.4.2 documentation
Were it gets complicated is when you want this function to change things in model variables instead of simply returning a value. If this is in C++ standalone mode, you can make use of the fact that all variables are stored in globally available arrays and that you can get their names with something like device.get_array_name(neurongroup.variables['V']). There is a discussion thread that should give some more hints: User-defined functions
Let me know if you run into issues, this part of Brian still needs quite a bit of polishing.

2 Likes

@mstimberg just a sketch of my construction. Note that different connections in the system have different delays…

 def step_delay():
        #                                                    0         1         2         3           4           5     
        step_delay.buffer[:,step_delay.windex] = hstack((Cx.CxE[:],Cx.CxK[:],Cx.CxP[:],Cx.CxI[:],ThRe.ThE[:],ThRe.ReI[:]))
        N = JRTRG.params['N']
        Cx2Cx.CxE2CxP_delval = step_delay.buffer[0*N:1*N,(Cx2Cx.CxE2CxP_delint+step_delay.windex)%step_delay.maxdel]
        Cx2Cx.CxP2CxE_delval = step_delay.buffer[2*N:3*N,(Cx2Cx.CxP2CxE_delint+step_delay.windex)%step_delay.maxdel]
        Cx2Cx.CxP2CxI_delval = step_delay.buffer[2*N:3*N,(Cx2Cx.CxP2CxI_delint+step_delay.windex)%step_delay.maxdel]
        Cx2Cx.CxI2CxP_delval = step_delay.buffer[3*N:4*N,(Cx2Cx.CxI2CxP_delint+step_delay.windex)%step_delay.maxdel]
        Cx2Cx.CxE2CxK_delval = step_delay.buffer[0*N:1*N,(Cx2Cx.CxE2CxK_delint+step_delay.windex)%step_delay.maxdel]
        Cx2Cx.CxK2CxE_delval = step_delay.buffer[1*N:2*N,(Cx2Cx.CxK2CxE_delint+step_delay.windex)%step_delay.maxdel]
        TR2TR.ThE2ReI_delval = step_delay.buffer[4*N:5*N,(TR2TR.ThE2ReI_delint+step_delay.windex)%step_delay.maxdel]
        TR2TR.ReI2ThE_delval = step_delay.buffer[5*N:   ,(TR2TR.ReI2ThE_delint+step_delay.windex)%step_delay.maxdel]
        TR2Cx.ThE2CxP_delval = step_delay.buffer[4*N:5*N,(TR2Cx.ThE2CxP_delint+step_delay.windex)%step_delay.maxdel]
        TR2Cx.ThE2CxE_delval = step_delay.buffer[4*N:5*N,(TR2Cx.ThE2CxE_delint+step_delay.windex)%step_delay.maxdel]
        TR2Cx.ThE2CxK_delval = step_delay.buffer[4*N:5*N,(TR2Cx.ThE2CxK_delint+step_delay.windex)%step_delay.maxdel]
        Cx2TR.CxP2ThE_delval = step_delay.buffer[2*N:3*N,(Cx2TR.CxP2ThE_delint+step_delay.windex)%step_delay.maxdel]
        Cx2TR.CxP2ReI_delval = step_delay.buffer[2*N:3*N,(Cx2TR.CxP2ReI_delint+step_delay.windex)%step_delay.maxdel]
        step_delay.windex    = (step_delay.maxdel-1) if step_delay.windex == 0 else (step_delay.windex - 1)

step_delay.maxdel is computed first. That is max steps needed for the logest delay in the system. Then step_delay.buffer is initialized.

step_delay.maxdel = int( round(step_delay.maxdel*ms/defaultclock.dt)+10 )
step_delay.buffer = zeros((NVariables,step_delay.maxdel))
step_delay.windex = 0

In synaptic model we use _delval. For example

inCxP2CxE_post = CxP2CxE * CxP2CxE_delval    : 1 (summed)
CxP2CxE_delval                               : 1
CxP2CxE_delint                               : integer (constant)

And finally I declare NetworkOperation function

netstdel = NetworkOperation(step_delay)

something like this. I tried multiple buffers the step_delay function, but this is even slower.

I’m going to take a shot at a very simple demo notebook with a naive network_operations implementation for this today.

I haven’t looked into the C++ standalone option yet, but I want to make sure I’ve ruled out the simplest solution first (for my application something slow might be good enough).

I see @rth has a network operations solution also. I’m going to start from scratch and then afterwards we can look for commonalities / differences, and hopefully the notebook will be minimal enough for it to be easy to modified and give feedback on

Haven’t got to the network_operations part yet, but here’s some implementations of a 2D buffer which “rolls” along the 2nd dimension:

Google Colab notebook

scripts and helper functions under version control at GitHub - awillats/brian_delayed_gaussian

code reposted here: class definitions

from collections import deque 
import numpy as np

class DQRingBuffer(deque):
    '''
    NOTE: becuase of storage of columns as tuples, indexing is less flexible than numpy version

    Assumes last value in buffer is current i.e. 0-delay
    - this is compatible with calling
        buff.append(current_val)
        current_val_also = buff.get_delayed(0) 
    - see more discussion at https://stackoverflow.com/questions/4151320/efficient-circular-buffer
    '''
    from collections import deque
    def __init__(self, buffer_len = 100, n_channels=1):
        super().__init__([],maxlen = buffer_len)
        self.n_channels = n_channels
    #inherits append method
    def append(self, new_val):
        if self.n_channels > 1 and len(new_val) is not self.n_channels:
            raise IndexError(f'ERROR, wrong size to append\n expected size {self.n_channels} got {len(new_val)}')
            return None
        if isinstance(new_val, np.ndarray):
            new_val = tuple(new_val)
        return super().append(new_val)
    
    def fill(self, fill_val):
        for i in range(self.maxlen):
            self.append(fill_val)
    def to_np(self):
        return np.asarray(self)
    def get_delayed(self, delay=1):
        if not 0 <= delay < self.maxlen:
            # won't let you return get_delayed(0),
            raise IndexError(f'delay {delay} out of bounds for buffer length {self.maxlen}')
        # print(-delay-1)
        return self[-delay-1]

class RingBuffer:
    '''
    loose implementation of a ring-buffer with numpy arrays
    - only buffers along axis=1
    - consider something like collections.deque for a stricter implementation
    - see also Implementing a Ring Buffer - Python Cookbook: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch05s19.html
    - speed test for roll implementations: https://gist.github.com/cchwala/dea03fb55d9a50660bd52e00f5691db5

    '''
    def __init__(self,buffer_len = 100, initial_val=None):
        self.n_channels = n_channels 
        self.buffer_len = buffer_len
        self.values = np.full(self.buffer_len, initial_val)
    
    def append(self, new_val):
        # self.values = np.append(self.values, new_column)
        # self.values = self.values[:, 1:]
        self.values = np.roll(self.values, -1, axis=0)
        self.values[-1] = new_val

    def last(self):
        return self.values[-1]

    def get_delayed(self, delay=1):
         
        if not 0 <= delay < self.buffer_len:
            raise IndexError(f'delay {delay} out of bounds for buffer length {self.buffer_len}')
        # alternate implementation would
        # cap the delay, so that delay = -Inf returns the oldest value in the array.
        # delay = max(min(delay, self.size),1)
        return self.values[-delay-1]
        
    def fill(self, fill_val):
        self.values.fill(fill_val)
    def __getitem__(self, idx):
        return self.values[idx]
    def __str__(self):
        return f'buffer with len: {self.buffer_len}, \nvals: {str(self.values)}\n'
        
class RingBuffer_2D:
    '''
    (could be made simpler by extending numpy.ndarray)
      would eliminate need to duplicate:
        - fill()
        __getitem__()
        - str()
    loose implementation of a ring-buffer with numpy arrays
    - only buffers along axis=1
    - consider something like collections.deque for a stricter implementation
    - see also Implementing a Ring Buffer - Python Cookbook: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch05s19.html
    '''
    def __init__(self, n_channels, buffer_len = 100, initial_val=None):
        self.n_channels = n_channels 
        self.buffer_len = buffer_len
        self.values = np.full((self.n_channels, self.buffer_len), initial_val)
    
    def append(self, new_column):
        #expands column to 2D 
        self.values = np.append(self.values, new_column[:,None], axis=1)
        self.values = self.values[:, 1:]
        
        # np.roll is about 2x slower than method above for medium-sized arrays
        # self.values = np.roll(self.values, -1, axis=1)
        # self.values[:,-1] = new_column

    def last(self):
        return self.values[:,-1]

    def get_delayed(self, delay=1):
        if not 0 <= delay < self.buffer_len:
            raise IndexError(f'delay {delay} out of bounds for buffer length {self.buffer_len}')
        # alternate implementation would
        # cap the delay, so that delay = -Inf returns the oldest value in the array.
        # delay = max(min(delay, self.size),1)
        return self.values[:,-delay-1]
        
    def fill(self, fill_val):
        self.values.fill(fill_val)
    def __getitem__(self, idx):
        return self.values[idx]
    def __str__(self):
        return f'buffer with {self.n_channels} columns, len: {self.buffer_len}, \nvals: {str(self.values)}\n'
code reposted here: testing the classes
# if __name__ == "__main__":
import time 

n_channels = 10
buff_len = 10000

np_buff = RingBuffer_2D(n_channels = n_channels, buffer_len=buff_len)
dq_buff = DQRingBuffer(buff_len, n_channels)
dq_buff2 = DQRingBuffer(buff_len, n_channels)
'''
DeQue ring buffer is generally faster than numpy 2D version
'''

ntest = 1000 #10000
t0 = time.time()

vec = np.arange(n_channels)
vec.shape

dq_buff.fill(vec*0)
dq_buff2.fill(vec*0)

# Test numpy 2D array as buffer
for i in range(ntest):
    np_buff.append( vec+i )
    _ = np_buff.get_delayed(1)

t1 = time.time()
num_t = t1-t0
print('numpy 2D test')
print(f'{num_t:.3f} seconds for {ntest} iterations\n')

# Test deque of tuples as buffer
t0 = time.time()
for i in range(ntest):
    dq_buff.append( vec+i )
    _ = dq_buff.get_delayed(1)
t1 = time.time()
dq_t = t1-t0
print('deque test')
print(f'{dq_t:.3f} seconds for {ntest} iterations\n')


# Test deque of tuples as buffer + time to convert to numpy
t0 = time.time()
for i in range(ntest):
    dq_buff2.append( vec+i )
    _ = np.array(dq_buff2.get_delayed(1))
t1 = time.time()
dq_conv_t = t1-t0
print('deque test w/ conversion to numpy')
print(f'{dq_conv_t:.3f} seconds for {ntest} iterations\n')



print( f'deque {num_t/dq_t:.2f} times faster')
print( f'deque {num_t/dq_conv_t:.2f} times faster with conversion back to numpy array') 

the upshot is collections.deque [docs] is much faster, especially for long arrays than using a numpy 2D array. (up to 150x faster in some tests). It’s also worth noting that deque has an implementation in the standard library in C++

If you want to use a numpy 2D array (because the indexing is much more flexible this way) you should “manually” implement the roll piece:

    def append(self, new_column):
        #expands column to 2D 
        self.values = np.append(self.values, new_column[:,None], axis=1)
        self.values = self.values[:, 1:]

rather than use the built in np.roll() method

update:
while deque is much faster for getting single values from the buffer, the magnitude of this speed difference is quite small relative to the total runtime of a Brian2 simulation (for the very small networks and ~100 sample buffer lengths I tested).

Moreover, if you have to convert back to numpy arrays to use advanced indexing, the deque approach may end up being slower.


In other news, I got these plots by combining marcel’s network_operation solution with this custom numpy ring buffer

showing buffered results match output of voltage monitors:
Screen Shot 2021-11-16 at 5.05.26 PM

showing signal B is delayed from signal A:
Screen Shot 2021-11-16 at 12.54.33 PM

the script gaussian_delay_example_netop_multi_population.py
now supports heterogeneous delays

I also added a naive “development roadmap” / sub-feature request list to the README.md for the repo

@adam, a few notes from the back of my mind (I’m traveling now, therefore very briefly).

  1. all possible FIFO solutions (deque, for example) require memory allocation for each computational step, even on the level of C/C++, mostly because FIFO doesn’t know required buffer depth (it may be even variable for FIFO). Memory allocation is always slow. Therefore, if we know the max delay, it is better to allocate all required memory and use a cycling pointer/index. It should be much faster than any FIFO solution.
  2. np.concatenate([ G.v[:] for G in all_groups ]) looks pretty elegant, but you need to be sure that you know where each group is located in the vector, otherwise, it’s easy to mess up.
  3. let’s think in terms of a memory controller and quite a big model/long delays. The whole 2D “delay” table may not fit into the cache memory. In this case, it may be faster to have independent buffers for each delay line, so it will be a sequential reading - the most effective scenario for a memory controller.

sorry for the squeezed reply, a more detailed analysis of your very interesting and thoughtful posts, as well as amazing @mstimberg comment on parameters handling a bit later.

1 Like

I agree on all of these points! I’m hoping the repository i’m working on will be a good way to explore and communicate those tradeoffs.

In general, I’m marching (slowly) from the simpler solutions to construct and index values from (i.e. a dense numpy array) towards more tailored, memory and speed-efficient solutions. My hope is that this will eventually result in an efficient C++ implementation, but we can stop at a threshold of usability / efficiency at any point along the way.

I’m also trying to gather up as many of these ideas as possible in the README.md for the repo (including ideas like a “rolling index” and various sparse storage options like “lists of lists” and “COO” coordinate lists).

As to point 2 about tracking groups, currently I have the luxury of working with small, relatively abstract networks where knowing the number of populations and neurons per population is enough to index which signals come from where. However I might try something like a dictionary (with population names as keys) to balance elegant access to recorded variables and more robust-to-human-error recording of which population is where.

EDIT: @rth, the first time I read your comment I misinterpreted it as being about sparse storage, but now I think point 3. refers to orienting the matrix used for storage such that the network state can be sliced at a single point in time (with such a vector stored contiguously in memory?).

If so, that’s not something I’d considered, and I will add that as a consideration under the “indexing conventions / storage” section


The scripts for the repo are quite messy at the moment as I’m working on visualizing the cross-correlation between signals. Part of my selfish interest in understanding how to simulate delayed interactions in Brian is to study how network properties affect when we can recover connectivity. The plotting is also overly complicated because I’m experimenting with bundling brian data into dataframes then using plotly for interactivity - i’ll hopefully post an interactive demo here by the end of the day.


here are some quick whiteboard sketches to work through thinking about sparse storage

heres an interactive plot I generated from a gaussian network with delays:
https://htmlpreview.github.io/?https://github.com/awillats/brian_delayed_gaussian/blob/main/figs/gaussian_combo.html
you can subselect traces by clicking on the legend, and zooming into the x and y-axes are linked across the relevant subplots

here’s a non-interactive preview: