Accelerating brian2cuda simulation

Does brian2cuda automatically exploit 100% of the GPU power or there are some better practices such as setting particular preferences to speed up the simulation?

At the present time, I am using only this line of code:

set_device('cuda_standalone', directory='my_folder', compile=True)

I should also mention that I have one single .run(duration) call

Thank you very much in advance

Hi @lparr,

generally, there are a few preference that could be tweaked in Brian2CUDA, but if that would speed up your model would very much depend on the details of it. These performance tweaks are not yet documented in a very user-centric way. But here are the resources we have so far:

  1. You can find an automatically generated list of preferences on the Brian2CUDA documentation page, but they are not well documented for new users I’m afraid. Brian2CUDA specific preferences — Brian2CUDA 1.0.0 documentation
  2. Our publication can give you an idea on the speedups you can expect for different model types (for that, reading the result section and looking at figure 5 and maybe 6 and 7 is probably sufficient). There is also a lot of documentations of Brian2CUDA internals, but this might be very technical for a user simply wanting to speed up the simulation a bit :slight_smile: You find it here. But note, these benchmarks use always small numbers of NeuronGroups and Synapses objects and if you have many of them, that can currently effect the speedups you see.

Having said that, I would maybe first have a look at your model in general to get an idea if preferences are the solution at all. I would start with using Brian’s internal profiling option as documented here. In short, add a profiling=True option to your run call and then print(profiling_summary()) at the end of your simulation. This should tell you what part of your simulation takes up the most time. Feel free to post that here and I can give you hints for optimization.

The profiling summary only profiles everything happening inside the time loop, that means no compilation or network initialization times. There are some cases where these can be significant, so I would additionally record the execution time of your run call (time.time() before and after).

I would also do the same (profiling and time run call) with set_device("cpp_standalone") to have a comparison. Not all models are suitable for acceleration with Brian2CUDA, this will give you an idea if that is the case.

Generally, here are a few model features that are important for Brian2CUDA performance. If you give me some numbers (or alternatively a link to your code), I might be able to help you speed things up:

  • How many NeuronGroup objects of what size are you simulating
  • How many Synapses objects with how many synapses are you simulating, and - most importantly - do they have synaptic delays (and if so: do all synapses of the them Synapses object have the same or different synaptic delays)
  • How long is your simulation (duration argument in your run call) and what simulation time step are you using (if you didn’t change the defualt, it shoudl be 0.001 ms).

And last but not least, if you run into any kind of issue with Brian2CUDA, I would always first try your simulation in Brian2 C++ Standalone (set_device("cpp_standalone")). If the issue persists in C++ Standalone, it is likely an issue that is general to Brian’s Standalone mode (like the issue you had with variable initialization). In that case, @mstimberg will probably have the best solutions or you might find an answer in the Brian2 documentation :). Only if your issue is not present in C++ Standalone, it is likely an issue of Brian2CUDA.

1 Like