Can we go beyond store/restore to more easily reuse the network?

Description of problem

I want to save the entire network (setup and states) to reuse across interpreter sessions. I wish I could pickle/dill the whole thing, but this runs into problems with weakref and generated code. I see the closest thing to a solution provided by Brian is the store()/restore() functions but for restore to work I need to reconstruct the network exactly as before. I’ll do it if I have to, but first, how hard would it be to automate the reconstruction part (everything minus generated code, I assume)? If it wouldn’t be overly difficult, I’d rather contribute a general solution to future Brian users than just solve the problem one time for my use case.

For context, I need to save and load the network repeatedly in different parts of an experiment pipeline (which I mentioned here). For example, run a training simulation once, save the whole network, then run experiments with different parameters in parallel. I need to be able to load the trained network from the saved file in multiple interpreter sessions.

Ideas

  • create additional save()/load() functions, could pickle everything including states but removing problematic parts (weakrefs, device?)
  • optionally save the device (device['arrays'] seems to be where the weakref problem is occurring) with normal, not weak, references

Full traceback of error (if relevant)

weakref causes problems after unpickling, but only once I try to run:

Traceback (most recent call last):
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/core/network.py", line 1003, in before_run
    obj.before_run(run_namespace)
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/groups/group.py", line 1266, in before_run
    self.create_code_objects(run_namespace)
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/groups/group.py", line 1259, in create_code_objects
    code_object = self.create_default_code_object(run_namespace)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/groups/group.py", line 1240, in create_default_code_object
    self.codeobj = create_runner_codeobj(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/codegen/codeobject.py", line 484, in create_runner_codeobj
    return device.code_object(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/devices/device.py", line 358, in code_object
    codeobj = codeobj_class(
              ^^^^^^^^^^^^^^
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/codegen/runtime/cython_rt/cython_rt.py", line 108, in __init__
    super().__init__(
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/codegen/runtime/numpy_rt/numpy_rt.py", line 201, in __init__
    self.variables_to_namespace()
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/codegen/runtime/cython_rt/cython_rt.py", line 249, in variables_to_namespace
    self.namespace[dyn_array_name] = self.device.get_value(
                                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/site-packages/brian2/devices/device.py", line 529, in get_value
    return self.arrays[var]
           ~~~~~~~~~~~^^^^^
  File "/home/kyle/miniforge3/envs/snakemake/lib/python3.12/weakref.py", line 415, in __getitem__
    return self.data[ref(key)]
           ~~~~~~~~~^^^^^^^^^^
KeyError: <weakref at 0x7fd42b8e2e30; to 'DynamicArrayVariable' at 0x7fd42b821310>

Hi @kjohnsen, it would indeed be nice to support these things in a better way. As you said, we already have a way to save the internal state (at least in runtime mode – it would be great to have this for standalone mode as well), but we don’t have a (de)serialization of a network itself. I think it is important to have good pickling support, since it is e.g. invoked when you run things in parallel with multiprocessing, but I’d rather try to avoid it as a general solution. I find it is too fragile and might e.g. break in non-obvious ways when you try to load a pickle file generated with an older version of Brian with a newer version. IMHO, the best solution would be to have explicit serialization and de-serialization of objects (without the internal states, which could be handled with the existing mechanism). We actually do have a serialization for most of Brian’s objects in brian2tools’s “base exporter”. This is not exposed in the way that we need here, but it would be easy to adapt, I think. The base exporter translates a network into a dictionary representation, which is straightforward to write to disk as json or yaml or similar. We don’t currently have a mechanism to deserialize this representation, but this would be straightforward.
What do you think about all this?

Actually, now that I am writing about it, I might propose a project along these lines for this year’s Google Summer of Code

That makes sense to me. I did look at base exporter and noticed there wasn’t an import function. Then I forgot about that when writing the above post :sweat_smile:. Sounds like that would be more work than a quick pull request? Meanwhile I’ll just reconstruct the network as a workaround then.

One other question @mstimberg: how do I make sure I’m maximally reusing code compilation? It seems like the docs are tailored for the case where you trigger multiple runs from the same interpreter session. Will the caching mechanism work across interpreter sessions, if I reconstruct a network, restore the states, and run()? Part of my pipeline will run on brian2cuda, and after that I will be using cython (I need network operations), so I would like to know how both are affected.

I also see it’s suggested to use a separate cache dir per process and disable file locking when working on a cluster in parallel. Will this not essentially disable cached code generation? This would result in a lot of wasted compute in my case, generating code/compiling thousands of times. Could I disable file locking, use the same cache dir per process, but maybe somehow manually ensure that the compilation finishes before running in parallel, essentially making the compilation step part of the pipeline?

Yeah, it shouldn’t be too difficult, but as always I fear that the devil is in the details, and it will take a bit longer to get it working 100%.

So, things are a bit different between Cython and standalone modes (C++ standalone and Brian2CUDA).
For Cython:
The proposed solution to basically switch off caching is indeed not meant for your use case, but rather if you use a single set of parallel runs – in that case, it wouldn’t make much difference if you compile everything once and then run the simulations in parallel, or if you run everything in parallel and multiply the compilations. In general, we took some code that identical Brian objects should only be compiled once (if they aren’t, this is a bug). The biggest obstacle in practice is that objects gets names assigned automatically by default, and these might therefore change. But you’ll have to make sure the names stay the same in any case if you use store/restore. All this means that if you recreate a network identically, no new code should be compiled. You could therefore have a separate compilation stage for Cython by emitting a run(0*ms). To be 100% that everything is working as expected, you might even set CXX=/not/a/compiler during the runs that run in parallel and are no longer supposed to compile anything. You should be able to safely switch off the file locking in this case.

For standalone (e.g. Brian2CUDA):
Here the situation is a bit different since we already have a dedicated compilation stage, even if it is executed automatically. As you probably know, when you use set_device(..., build_on_run=False), the device.build is not automatically triggered by the run statement, and you have to manually add device.build() when you want to actually compile and run the network. You can split the two phases by using device.build(run=False) and device.run() instead of a single device.build() call.
But note that the situation is different here from runtime mode (Cython etc.) with respect to what is part of the generated code: in standalone mode, assignments like neuron.v = E_L will be included in the generated code. If you change one of these assignments, you therefore have to be careful. In older versions of Brian, you would have had to rebuild the code, but recently we added the run_args feature that allow you to do assignments at runtime (e.g. to set neuron.v to something, you’d use device.run(run_args={neuron.v: my_value})). See Computational methods and efficiency — Brian 2 2.8.0.2 documentation
Another thing to keep in mind: parallel processes can all use the same binary, but they need to take care to not write the results to the same directory, so make sure to set the results_directory for each process (if you use device.build, this has to be in the device.build call, the setting in set_device will be ignored!).

Hope that clears things up a bit?