Unsupervised learning of digit recognition using spike-timing-dependent

Good afternoon, all Brian users

I am reading this paper “Unsupervised learning of digit recognition using spike-timing-dependent plasticity” ( https://www.frontiersin.org/articles/10.3389/fncom.2015.00099/full). I have several questions regarding this article.

  1. Is there any Brian2 implementation of STDP rules together with lateral inhibition and homeostasis?
    or just an example of STDP with lateral inhibition?

  2. As far as I understood, in this work “adaptive spiking threshold” has been used by giving a condition on to “reset”. If it is so, what is the functionality of “homeostasis” and where we can use that on equations?

Thank you,

As a quick remark, there is a Brian2 implementation of this specific paper by Xu Zhang : https://github.com/zxzhijia/Brian2STDPMNIST The code is a couple of years old, though, and it does not get updated anymore (e.g. it is not compatible with Python 3 as it is). A number of people forked and adapted this example, though (e.g. https://github.com/sdpenguin/Brian2STDPMNIST), these later versions might have fixed some of the issues.

Quite a few people seem to be interested in this example, it would be great if some of you could get together and turn this into a working example that can be shared with the community. This is somewhat similar to Pattern recognition in Spiking Neural Nets using Brian2 in scope, so maybe @touches and @Ziaeemehr would be interested?

1 Like

Thank you, @mstimberg

@touches and @Ziaeemehr, could you look at this paper, please?

I would be glad to work on this new project.
Thank you very much.

1 Like

I will give a try. These days I am a bit busy for preparing results for neuromatch 3 talk.
cheers,
Abolfazl

Hello Assel,

This paper is quite frequently referenced and the code is a tad too difficult to decipher as I belive the style its written in can be made a little clearer. A couple of dark spots that I could not figure out from the code was the Synaptic connections. In the paper there is lateral inhibition (i != j) but in the code the synapses seem to be connected in an one-one fashion. Like Marcel said, this would be an excellent exercise to implement this paper in a more understandable manner and get it to produce results that can derived easily. Let me know if you want me to start a project for this or you can go ahead creating one.

1 Like

Good afternoon,

@touches, yes it would be excellent. I really want to understand this article and work on it.

I could not understand the implementation of “lateral inhibition” and “homeostasis” on the code. Moreover, on paper was written that network structure consists of 2 layers, but isn’t it 3 layers?
Thank you!

Good day,
Dear Brian2 Users
With the help of @touches was working on implementing the above-mentioned paper from the scratch. Till now as was described in paper 3 layers were created and connected as follows:

Poisson → Exc [784-100] (All to all)

Exc → Inh [100-100] (One-to-one)

Inh → Exc [100-100] (each inhibitory neuron is connected to all excitatory neurons, except for the one it receives a connection from.)

Poisson → Exc connected with the STDP rule.

Poisson input rates were taken from MNIST, but only reduced form one sample of one number 5.

The next step will be to implement homeostasis and lateral inhibition.

Problems are:

  1. statemon_inh_exc shows zero
    Also, could you look at the initializations of neuron parameters and “Syn_poi_exc”, “Syn_exc_inh”? I’m not able to solve these issues to go to the next step. The code and output plots are here https://colab.research.google.com/drive/1YRVVRs74GlFd_pSQ7TEm6USS6fN-nGvb?usp=sharing

Thank you. I will look forward to any guidelines and corrections.

Hi. I had a quick look at the code and ran the simulation. I think the main problem is that the excitatory weights from the Poisson neurons to the excitatory neurons are too strong. Each neuron receives input from 784 Poisson neurons and each of these neurons fires on average with something like 50Hz. In total, each neuron therefore receives several 10000 spikes per second. You initialize the weights to rand()*gmax, so on average weights will be 0.5nS, which seems too high for that many inputs. You see that your neurons are firing as fast as the refractory period allows them, this is certainly not what you want. So try initializing the weights with much less, e.g. 0.1*rand(). Maybe even use smaller values, I’d aim for something that is as low as possible while still evoking a couple of spikes per second on average for the initial weights (note that you can test this for very short simulations, i.e. no need to run things for 50s). I’d also add (unless refractory) to the equation for v in your neuron models – this will clamp the membrane potential to the reset value during the refractory period, which is what models usually do. It also makes it clearer to see whether the neuron is in the refractory period or not. As a general rule, I’d say that a neuron that is limited by its refractory period should really be an exception. Most neurons should receive input that does not drives them that strongly.

The inhibitory neurons are not getting any input since the weights from the excitatory population to the inhibitory population are all 0. Setting them to something > 0 should make the inhibitory population active. And then you can close the feedback loop for the lateral inhibition by also setting the inhibitory → excitatory weights to > 0.

Regarding the initialization of v for exc_layer and inh_layer. Instead of e.g.

exc_layer.v = 'v_rest_e + rand()*(v-v_rest_e)'

you probably meant to write

exc_layer.v = 'v_rest_e + rand()*(v_thresh_e-v_rest_e)'

to initialize the membrane potential between the rest and the threshold. You do not necessarily have to initialize g_exc and g_inh at all, for them 0 is a good initial value.

Hope that helps you advancing!

1 Like

Hi Marcel,

This looks like a interesting topic for learning how to implement digit recognition using STDP on brian2. However, do you have any recommendations on where beginners could learn how to code this out in python in a step by step approach (training, assigning class labels and testing)? Thank you!

1 Like

Hi Henry,
I am afraid I cannot point you to any tutorial that covers this kind of machine learning approach in Brian, but I agree that having something like this could be useful. Maybe at the end of this project (if the code works and reproduces the results from the original paper) someone could be motivated to turn this into a step-by-step tutorial?

Hi all,

I am trying to train a STDP to perform classification on iris dataset. I am using 4 poisson input neurons connected to 1 output neuron for implementation purpose, just train these whole network to recognise class 0 of sklearn iris dataset. However, using the STDP parameters, it seems that there is no spiking activity by training the normalized dataset. Moreover, by testing the trained network on class 1 (by repeating the code, but turning off weight updates), it seems like there is no spiking as well. What could be wrong here? I will appreciate if someone could feedback on this, I suspect it has something to do with the STDP parameters. Kindly feedback on how I could edit the code to make it minimum viable? Thanks!

Link is https://colab.research.google.com/drive/1sYEvxjZ2W1zfZOaT0zA8Y3KbJ0zEhS2S?usp=sharing

Hi @Henry . I only had a cursory look at the code, but this should not be related to the STDP parameters – if there are no spikes, the STDP rule cannot work. If I understand your code correctly, you feed in 4 Poisson groups with spike rates between only 0 and 1Hz, and a maximum weight of 0.3mV (equivalent to your threshold). This will practically never evoke a spike because you will almost never get any input spike in your 20ms interval, and even if you’d get single input spike, its effect would not be strong enough to evoke an output spike. Before looking further into the learning, you’ll have to decide how many input spikes you want during each 20ms interval and set the firing rate accordingly. I’d also suggest to plot the membrane potential recorded by the StateMonitor to have a better idea of what is going on.

Oh, and just one remark for a more efficient implementation: instead of recreating the network from scratch every iteration of your loop, you could create it once and use the store()/restore() mechanism to reset it: Running a simulation — Brian 2 2.4.2 documentation

1 Like

Good day,

Thank you for clarifying the steps @Jul94, but my struggle wasn’t that. I wasn’t able to understand some parts of this article.

( https://www.frontiersin.org/articles/10.3389/fncom.2015.00099/full ). So, I have several questions regarding this article. I’m referring to this implemented source code GitHub - sdpenguin/Brian2STDPMNIST: Brian 2 version of Paper “Unsupervised Learning of digit recognition using STDP” as you mentioned.

  1. As far as I understood, in this work “adaptive spiking threshold” has been used by giving a condition on to “reset”. If it is so, what is the functionality of “homeostasis” and where we can use that on equations?
  2. «The performance of this approach scales well with the number of neurons in the network, and achieves an accuracy of 95% using 6400 learning neurons.» How to get this 95% accuracy? Where is its implementation in the code(could you tell the file name and no. of line, please)? This question comes out due to I do not know what is the Loss(error) function (in SNNs) in this work? How to check accuracy?
  3. How to do the evaluation process (in SNNS) in this article?
  4. How does lateral inhibition generate competition? How can we see(where can we see) that lateral inhibition leading to the competition? (could you tell the file name and no. of line, please)
  5. How does homoeostasis function?
  6. In real Diehl&Cook’s source code they hadn’t used one-to-one and connected to all excitatory ones, except for the one from which it receives a connection. There was an only all-to-all connection between each layer. However, on paper, they say used the one-to-one connection between exc. and inh. How & Why is it different in the source code?
  7. Where is assigning labels(in the implemented code)? How is it implemented? Where are these excitatory neurons are assigned to classes after training? (could you tell the file name and no. of line, please)
  8. The learning rule is similar to the one used in Querlioz et al. (2013) but here we use an exponential time dependence which is more biologically plausible (Abbott and Song, 1999) than a time-independent weight change. How to understand this «using an exponential time dependence»?
  9. What is the loss(error) function in SNN in this paper?

Thank you very much. I’m new in this field, I’m sorry if I asked easy questions. I would be glad to read papers and analyze them and work on their implementations in Brian2. Thank you!

Hi @assergazina, I’ll do my best to answer your questions, but I’m also fairly new to the field, so take everything with a grain of salt. Let’s see:

  1. I think the "adaptive spiking threshold has to do with the inclusion of \theta to the membrane threshold. As they say in the paper:

    Specifically, each excitatory neuron’s membrane threshold is not only determined by vthresh but by the sum vthresh + θ , where θ is increased every time the neuron fires
    and is exponentially decaying (Querlioz et al., 2013). Therefore, the more a neuron fires, the higher will be its membrane threshold and in turn the neuron requires more input to spike in the near future. Using this mechanism, the firing rate of the neurons is limited because the conductance-based synapse model limits the maximum membrane potential to the excitatory reversal potential Eexc , i.e., once the neuron membrane threshold is close to Eexc (or higher) it will fire less often (or even stop firing completely) until θ decreases sufficiently.

    In the code, this is reflected by the expression in line 262 (notice the theta there):
    v_thresh_e_str = '(v>(theta - offset + v_thresh_e)) and (timer>refrac_e)'

  2. For that, you have to change the number of neurons from 400 (line 205, n_e = 400) to 6400, and train the network (change code to False as they say in line 179: test_mode = True # Change this to False to retrain the network). Once you have trained the 6400 neurons network, to see how it performs, change the test mode back to True to record its performance against the test set of the data (this is done in lines 465-467 of the code, btw). Training a 6400 neurons network will take a very long time, though. I’m not sure if you can skip that step and use their pre-trained weights to assess performance (that is, keeping the test mode True but only modifying the number of neurons), but you can try to see if it works without raising an error.
    Then, you have to run the file Diehl&Cook_MNIST_evaluation.py (changing the number of neurons accordingly, this time in line 56). In this file, the performance of the network is assessed according to the recorded performance with the test set of the data (that is, here is where they check the accuracy). I discuss the loss function in the reply to question 9.

  3. Not exactly sure what you mean by that. To evaluate the performance and get the accuracy, follow the steps I described in the previous lines. If you mean “how that accuracy value is obtained” for example, I think these excerpt from the paper might answer your question:

    After training is done, we set the learning rate to zero, fix each neuron’s spiking threshold, and assign a class to each neuron, based on its highest response to the ten classes of digits over one presentation of the training set. This is the only step where labels are used, i.e., for the training of the synaptic weights we do not use labels.
    The response of the class-assigned neurons is then used to measure the classification accuracy of the network on the MNIST test set (10,000 examples). The predicted digit is determined by averaging the responses of each neuron per class and then choosing the class with the highest average firing rate.

  4. Well, I think this is a very deep question and therefore not easily answered. There is no single line in the code where we can “see” this either. As they describe in the paper, they create an inhibitory layer of the same size than the excitatory one (400, 6400…). The excitatory layer is connected to the inhibitory in a 1-to-1 fashion, each excitatory neuron is connected to one inhibitory neuron, the one in the same position in the layer. This inhibitory layer is connected to the excitatory one in an almost all-to-all fashion, only each inhibitory neuron is not connected to its corresponding excitatory neuron. See Figure 1 in the paper for this. The idea behind that is simple… when an excitatory neuron fires, it will trigger the corresponding inhibitory neuron, which will inhibit all the other excitatory neurons. As they say in the paper:

    This connectivity provides lateral inhibition and leads to competition among excitatory neurons. The maximum conductance of an inhibitory to excitatory synapse is fixed at 10
    nS. However, the exact value did not have a big influence on the results of the simulation, instead the ratio between inhibitory and excitatory synaptic conductance has to be balanced to ensure that lateral inhibition is neither too weak, which would mean that it does not have any influence, nor too strong, which would mean that once a winner was chosen that winner prevents other neurons from firing.
    a few sections later
    In our network this means that every time a neuron spikes, because an example is similar enough to its receptive field, it will make its receptive field more similar to the example. The lateral inhibition prevents the prototypes from becoming too similar to each other (which means that they spread out in the input space), since only a few different neurons will be able to respond to each example and in turn only a few neurons can adapt their receptive fields toward it. Homoeostasis can be thought of as a tool to keep an approximately constant number of examples within range of the prototype.

  5. In this paper, homeostasis is implemented by means of the adaptive spiking threshold (as explained in the quote in the reply to 1., which was taken from the Homeostasis section of the paper). In the last words from the quote from 4., they further reason about its utility.

  6. I have no idea about this, actually, I wonder the same. I hope someone can give an answer to this.

  7. If I’m not mistaken, this is implemented by the function get_new_assignments(result_monitor, input_numbers), in line 138. As I understand, it is a majority voting: for each neuron, the digit that causes the most spikes is the one that is assigned as a label for that neuron.

  8. I think the exponential time dependence is given by x_{pre} in the equation (3):

    \Delta w = η(x_{pre} − x_{tar} )(w_{max} − w)^μ

    As they explain:

    This means that, besides the synaptic weight, each synapse keeps track of another value, namely the presynaptic trace x_{pre} , which models the recent presynaptic spike history. Every time a presynaptic spike arrives at the synapse, the trace is increased by 1, otherweise x_{pre} decays exponentially.

    I’m not entirely sure where exactly they implement this in the code, though. I see the variable exp_ee_pre in line 251, but it is actually never used (neither in the Brian2 code nor the original one). Alternatively, it might be defined in the equations, but I can’t see how. If anyone finds out, I’m all ears!

  9. This question would make for a long answer as well, but I’ll keep it short: the loss function has nothing to do with unsupervised learning. In simple terms, the loss function quantifies the error between the output from the network and the true label for any given sample. That error is what typically drives the weight change in standard ANNs. But by that definition you can see that you need to know the true label while training, which is indicative of supervised learning, and as you can read in the title of the paper, they attempt unsupervised learning. So there is no loss function in this paper, they use unsupervised learning rules, namely STDP.

I hope this helps! I’m not sure about some things that I wrote there, by the way, so as I said don’t hesitate to double-check anything. if I find any mistakes, I will update the post and add an edit note. And of course, if anyone knows the answer to the questions still unsolved, I’m eager to know!

1 Like

Hello,

First of all, thank you for your considerations and for your valuable time. I really appreciate it.
The questions no. 2,3 and 9, I meant about «how this model predicts the input data », I mean when you(we) do testing how it is going to recognize 5 is 5 (& 4 is not 5)? According to that, we evaluate accuracy right? If it does all predictions correctly, then its accuracy will be higher.
I did not get no.8,
Thank you for all the other answers, I understood(except 6, 8)

Regards,
Assel

No problem! That’s what this forum is for :grinning_face_with_smiling_eyes:.

About your clarification for questions no. 2,3, and 9:

During learning, the receptive fields of neurons converge towards the specific inputs of different digits. That basically means that some neurons will become sensitive to the characteristic input of a 7 (and trigger more often when presented with a 7), others for a 3, and so on… But all of this happens in an unsupervised manner, so the neurons “don’t know” which digit they are converging to. After training, there is a labeling phase, where neurons are assigned to a digit based on which digit they converged to. This is done like this: digits are shown to the neurons to see which digit causes the strongest response, and then that digit is assigned to that neuron (because that’s the digit the neuron has converged to, or learnt, the best).
Then, during testing, you just check which neurons are firing the most for any given digit presentation, and check if the label you assigned to those neurons matches with the digit you’re presenting (in that case, it is a correct classification) or not. That is what I understand from this, at least:

After training is done, we set the learning rate to zero, fix each neuron’s spiking threshold, and assign a class to each neuron, based on its highest response to the ten classes of digits over one presentation of the training set. This is the only step where labels are used, i.e., for the training of the synaptic weights we do not use labels.
The response of the class-assigned neurons is then used to measure the classification accuracy of the network on the MNIST test set (10,000 examples). The predicted digit is determined by averaging the responses of each neuron per class and then choosing the class with the highest average firing rate.

About question no. 8:

They mention indeed an “exponential time dependence”, so I checked the paper and the code to see how that was done. In the paper, the only reference I could find was related to the presynaptic trace, x_{pre}. That trace keeps track of the recent presynaptic spike history of a neuron, and is used for the computations of the STDP learning rule (as show in the equation I posted). Now, this is how the exponential time dependence is implemented using x_{pre}: when a presynaptic spike arrives at the synapse, the trace is increased by 1, otherwise x_{pre} decays exponentially. It is that exponential decay of x_{pre} that they refer to, or at least that is what I got by reading their paper.
Interestingly enough, as I said in the previous comment, I couldn’t locate where and how this is implemented in their code…

1 Like

Re 8., x_pre: Indeed, x_ee_pre and its fellow x_ee_post seem to be unused, both in the original code and in the Brian2 port.

Judging from the STDP equations from line 278 onwards, the pre- and postsynaptic traces (pre, post1, post2) are set to 1 upon the respective spikes, and decay with their time constants tc_*_ee. It’s perhaps a little unusual that the trace is simply set to one, rather than incremented, when a spike occurs, although this will obviously only start to make a difference at high firing rates.

2 Likes

Just a quick comment on that: while it is less common, setting the trace to a fixed value is sometimes used and corresponds to a “symmetric nearest-neighbour scheme” (see section 4.1.2 in this review by Morrison et al. 2008). This means that it only takes into account the spike that was just preceding instead of all the spike history.

3 Likes

Good day! @mstimberg , @kernfel , @Jul94 , Thank you for your considerations and time.
It helped me a lot. I hope I will do great stuff with Brian2.
Thank you,

Assel

2 Likes