Multiple run in standalone mode

My case is a bit different from this one but also involves multiple runs in the standalone mode. The code is a slightly modified version of the Brian2 and Python3 port of the paper of Diehl&Cook2015, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity”, which has been mentioned a few times in this forum (see this and this). However, I think a general outline of the code will suffice:

set_device('cpp standalone', build_on_run=False)

# Define auxiliary functions

# Load dataset

# Set parameters and equations

# Create network and connections

# Run the simulation
net = Network()
for obj_list in [neuron_groups, input_groups, connections, rate_monitors, 
spike_monitors, spike_counters]:
    for key in obj_list:
        net.add(obj_list[key])
...
# loop until the whole dataset has been covered
while j < (num_examples):
  # several run() calls for resting and presentation times

device.build(compile=True, run=True, debug=True, clean=False)

# Save results

# Plot results

device.delete(code=False)

So the key idea here is that we have a (large) network and run it in a loop, but not because we are trying different parameters/network configurations, but because we are presenting the different input patterns to the network.

When trying to run the simulation in standalone mode, I noticed the execution speed gradually slowed down as the simulation progressed. I understand that is due to what @mstimberg explained in a different thread:

I have yet to try the approach from that gist (at first I thought it’s not exactly the same situation, but it might work nonetheless), but on a more general note, I am wondering what is the best workflow in that scenario. That is, what is the best approach when you want to speed up a simulation where you feed an entire dataset to a large and fairly complex network (without any other change of parameters)? Any ideas?

Note: This is my first attempt at implementing a complex simulation in Brian, so I have still quite a few things to try and learn. For instance, one of my next steps will be to use a proper event-based dataset (e.g. N-MNIST) instead of standard MNIST, I think that might simplify some things about the original structure of the code. But of course any other ideas and advice is really appreciated!