MemoryError while using probability arrays for synaptic connections

Hello,

This isn’t a Brian2 error, but I was wondering if you might have some advice or insights into how I might be able to get around this memory issue.

I’m using large matrices to calculate the probability of connecting all the neurons from one group to another based on distance. And I’m doing this multiple times to get the connectivity between different groups of neurons. I don’t believe I need to store the probability matrix after I create the connections, so in theory I should be able to overwrite or delete the matrix variable ‘p’.

However, I run into this memory error when creating the probability matrix after going through 5 iterations (out of 54 connections).

I’ve tried the gc.collect() function and using del, but it doesn’t seem to change anything.

I’m using computer with 32GB RAM. Attached is the minimal script to reproduce the error.

Thanks for your help!
test_dist2.py (9.5 KB)

1 Like

Hi @lmun373 . One general remark: I am really not sure whether using this exact connection method (fixed total number of connections) is really worth all the effort – if you used Brian’s built-in connection method with the probability depending on the distance (with an appropriately tweaked scaling factor for the probability), your code would be much simpler. The downside would be that the total number of connections would no longer be constant, but given the massive number of connections that you have, the variation should be small in relative terms. But that is of course up to you :blush:

Regarding the memory issue itself, in general deleting/overwriting p should work, but I think the problem here is that you also store the returned result in the prob_array variable:

prob_array = find_p(src, tgt, n_dict, n_groups, p)

This in itself does not change the memory usage (prob_array is only a second reference to the memory of p), but it means that deleting p is not enough to free the memory, you also have to delete prob_array.

But the biggest problem might be the find_p function itself. To calculate the pairwise distances, it creates big matrices that repeat the X and Y coordinates to get appropriately shaped matrices. With numpy, you do not actually have to create these matrices, instead you can use broadcasting (see e.g. 1.4.2. Numerical operations on arrays — Scipy lecture notes – the examples actually calculate pairwise distances). This makes the code much more compact, but should also save quite a bit of memory (it will still create intermediate matrices, but not all at the same time). Your function could look something like this:

def find_p(src, tgt, n_dict, n_groups, p):
    radius = 0.000175
    
    l_name = list(n_dict.keys())
    
    array_X_src = n_groups[l_name.index(src)].X_[:][:, np.newaxis]
    array_X_tgt = n_groups[l_name.index(tgt)].X_[:]
    
    array_Y_src = n_groups[l_name.index(src)].Y_[:][:, np.newaxis]
    array_Y_tgt = n_groups[l_name.index(tgt)].Y_[:]
    
    p[:] = 1 * np.exp(-(np.add(np.power(np.subtract(array_X_src, array_X_tgt), 2), np.power(np.subtract(array_Y_src, array_Y_tgt), 2)))/(2*radius**2))
    
    np.fill_diagonal(p, 0)

The slightly weird n_groups[...].X_[:] means to get the value in X without units and as a standard numpy array (instead of as Brian’s own VariableView which adds additional functionality on top). Importantly, this gets you a view on the values, i.e. they are not copied in memory. This does not seem to be enough, though, so in addition you can change the dtype for the p and the X and Y variables to float32:

 p = np.empty((n_groups[l_name.index(src)].N, n_groups[l_name.index(tgt)].N),
              dtype=np.float32)
# ....
neurons = b2.NeuronGroup(N, eqs, threshold='v>theta', reset=eqs_reset,
                         method='Linear', refractory=tau_ref,
                         dtype={'X': np.float32, 'Y': np.float32})

I did not measure/verify things in detail, but with this change I am at least able to go through the most of the connection creation process on my machine (with 32GB RAM as well) – it stops for 2/E → 5E, which are two very big populations and therefore a huge number of potential connections. I did not try this out, but if the intermediate arrays in find_p are the problem, you could also try to split things up, e.g. to divide the whole matrix into 4 quadrants. First calculate the pairwise distances between the first half of the source neurons and the first half of the target neurons, then between the second half of the source neurons with the first half of the target neurons, etc. The final matrix will of course still have the same size, but the intermediate matrices will be smaller.
There are more sophisticated ways of optimizing pairwise distance calculations, not sure you want to get into this… Either way, may I suggest storing the generated indices to disk so that you don’t have to recreate them from scratch every time you want to run the simulation :slight_smile: ?

Hope that helps a bit, best
Marcel

2 Likes

Thank you so much, that is incredibly helpful.

2 Likes