A Unifying View of Deep Networks and Hierarchical Temporal Memory

  • Sep 14 / 2014
  • 4
Cortical Learning Algorithm, NuPIC

A Unifying View of Deep Networks and Hierarchical Temporal Memory

There’s been a somewhat less than convivial history between two of the theories of neurally-inspired computation systems over the last few years. When a leading protagonist of one school is asked a question about the other, the answer often varies from a kind of empty semi-praise to downright dismissal and the occasional snide remark. The objections of one side to the others’ approach are usually valid, and mostly admitted, but the whole thing leaves one with a feeling that it is not a very scientific way to proceed or behave. This post describes an idea which might go some way to resolving this slightly unpleasant impasse and suggests that the discrepancies may simply be as a result of two groups using the same name for two quite different things.

In HTM, Jeff Hawkins’ plan is to identify the mechanisms which actually perform computation in real neocortex, abstracting them only far enough that the details of the brain’s bioengineering are simplified out, and hopefully leaving only the pure computational systems in a form which allows us to implement them in software and reason about them. On the other hand, Hinton and LeCun’s neural networks are each built “computation-first,” drawing some inspiration from and resembling the analogous (but in detail very different) computations in neocortex.

The results (ie the models produced), inevitably, are as different at all levels as their inventors’ approaches and goals. For example, one criterion for the Deep Network developer is that her model is susceptible to a set of mathematical tools and techniques, which allow other researchers to frame questions, examine and compare models, and so on, all in a similar mathematical framework. HTM, on the other hand, uses neuroscience as a standard test, and will not admit to a model any element which is known to be contradicted by observation of natural neocortex. The Deep Network people complain that the models of HTM cannot be analysed like theirs can (indeed it seems they cannot), while the HTM people complain that the neurons and network topologies in Deep Networks bear no relationship with any known brain structures, and are several simplifications too far.

Yann LeCun said recently on Reddit (with a great summary):

Jeff Hawkins has the right intuition and the right philosophy. Some of us have had similar ideas for several decades. Certainly, we all agree that AI systems of the future will be hierarchical (it’s the very idea of deep learning) and will use temporal prediction.

But the difficulty is to instantiate these concepts and reduce them to practice. Another difficulty is grounding them on sound mathematical principles (is this algorithm minimizing an objective function?).

I think Jeff Hawkins, Dileep George and others greatly underestimated the difficulty of reducing these conceptual ideas to practice.

As far as I can tell, HTM has not been demonstrated to get anywhere close to state of the art on any serious task.

The topic of HTM and Jeff Hawkins was second out of all the major themes in the Q&A session, reflecting the fact that people in the field view this as an important issue, and (it seems to me) wish that the impressive progress made by Deep Learning researchers could be reconciled with the deeper explanatory power of HTM in describing how the neocortex works.

Of course, HTM people seldom refuse to play their own role in this spat, saying that a Deep Network sacrifices authenticity in favour of mathematical tractability and getting high scores on artificial “benchmarks”. We explain or excuse the fact that our models are several steps smaller in hierarchy and power, making the valid claim that there are shortcuts and simplifications we are not prepared to make,  and speculating that we will – like the tortoise – emerge alone at the finish with the prize of AGI in our hands.

The problem is, however, a little deeper and more important than an aesthetic argument (as it sometimes appears). This gap in acknowledging the valid accomplishments of the two models, coupled with a certain defensiveness, causes a “chilling effect” when an idea threatens to cross over into the other realm. This means that findings in one regime are very slow to be noticed or incorporated in the other. I’ve heard quite senior HTM people actually say things like “I don’t know anything about Deep Learning, just that it’s wrong” – and vice versa. This is really bad science.

From reading their comments, I’m pretty sure that no really senior Deep Learning proponent has any knowledge of the current HTM beyond what he’s read in the popular science press, and the reverse is nearly as true.

I consider a very good working knowledge of Deep Learning to be a critical part of any area of computational neuroscience or machine learning. Obviously I feel at least the same way about HTM, but recognise that the communication of our progress (or even the reporting of results) in HTM has not made it easy for “outsiders” to achieve the levels of understanding they feel they need to take part. There are historical reasons for much of this, but it’s never too late to start fixing a problem like this, and I see this post (and one of my roles) as a step in the right direction.

The Neuron as the Unit of Computation

In both models, we have identified the neuron as the atomic unit of computation, and the connections between neurons as the location of the memory or functional adjustment which gives the network its computational power. This sounds fine, and clearly the brain uses neurons and connections in some way like this, but this is exactly where the two schools mistakenly diverge.

Jeff Hawkins rejects the NN integrate-and-fire model and builds a neuron with vastly higher complexity. Geoff Hinton admits that, while impossible to reason about mathematically, HTM’s neuron is far more realistic if your goal is to mimic neocortex. Deep Learning, using neurons like Lego bricks, can build vast hierarchies and huge networks, find cats in Youtube videos, and win prizes in competitions. HTM, on the other hand, struggles for years to fit together its “super-neurons” and builds a tiny, single-layer model which can find features and anomalies in low-dimensional streaming data.

Looking at this, you’d swear these people were talking about entirely different things. They’ve just been using the same names for them. And, it’s just dawned on me, therein lies both the problem and its solution. The answer’s been there all the time:

Each and every neuron in HTM is actually a Deep Network.

In a HTM neuron, there are two types of dendrite. One is the proximal dendrite, which contains synapses receiving inputs from the feedforward (mainly sensory) pathway. The other is a set of coincidence-detecting, largely independent, distal dendrite segments, which receive lateral and top-down predictive inputs from the same layer or higher layers and regions in neocortex.

My thesis here is that a single neuron can be seen as composed of many elements which have direct analogues in various types of Deep Learning networks, and that there are enough of these, with a sufficient structural complexity, that it’s best to view the neuron as a network of simple, Deep Learning-sized nodes, connected in a particular way. I’ll describe this network in some detail now, and hopefully it’ll become clear how this approach removes much of the dichotomy between the models.

Firstly, a synapse in HTM is very much like a single-input NN node, where HTM’s permanence value is akin to the bias in a NN node, and the weight on the input connection is fixed at 1.0. If the input is active, and the permanence exceeds the threshold, the synapse produces a 1. In HTM we call such a synapse connected, in that the gate is open and the signal is passed through.

The dendrite or dendrite segment is like the next layer of nodes in NN, in that it combines its inputs and passes the result up. The proximal dendrite effectively acts as a semi-rectifier, summing inputs and generating a scalar depolarisation value to the cell body. The distal segments, on the other hand, act like thresholded coincidence detectors and produce a depolarising spike only if the sum of the inputs exceeds a threshold.

These depolarising inputs (feedforward and recurrent) are combined in the cell body to produce an activation potential. This only potentially generates the output of the entire neuron, because a higher-level inhibition system is used to identify those neurons with highest potential, allow those to fire (producing a binary 1), and suppress the others to zero (a winner-takes-all step with multiple local winners in the layer).

So, a HTM layer is a network of networks, a hierarchy in which neuron-networks communicate with connections between their sub-parts. At the HTM layer level, each neuron has two types of input and one output, and we wire them together at such, but each neuron is really hiding an internal, network-like structure of its own.

 

 

 

Comments

comments

4 Comments

  1. Alex

    “Deep Network sacrifices authenticity in favour of mathematical tractability and getting high scores on artificial “benchmarks”.”

    In what sense are the benchmarks artificial? Neural networks are used in production object recognition and speech recognition systems. Moreover, can the HTM people come up with a benchmark where they outperform neural networks (or any reasonable alternative)? If not, do they have some idea for what benchmarks they would perform better on? Do they know why they lose to neural networks on simple benchmarks? My intuition is that being able to do object recognition is necessary but not sufficient for making progress towards AI. If HTM cannot get reasonably high accuracy on a simple image recognition benchmark, there is probably something wrong with the algorithm or the way it is being implemented/trained. This is something that should be understood in detail.

    Reply
    • admin

      Hi Alex,

      As I said on Reddit, the current implementation is too small a model for tasks like vision, which requires about 45% of the entire neocortex (compared to about 5% for language!). We are currently looking at introducing a model which involves all the layers of neurons in a region, and combining top-down feedback and simple motor behaviour, but these are at very early stages and all on paper so far.

      The capabilities of a neocortical hierarchy are highly dependent on the size of the hierarchy and the total surface area provided, so we anticipate that HTM-based systems will be applicable to real-world cognitive tasks once we can build larger systems. There is nothing wrong with the algorithm per se, it’s just that a single-layer only performs part of the processing neocortical regions and hierarchy support.

      Reply
  2. Mike

    Very interesting read. I think you are right in ascribing more complex behaviour to a neuron to find common ground. Cannot help thinking if there was a better understanding of the intent of behaviour in the biological neuron, it might be easier to model on a digital system with some procedural processing shortcuts.

    Reply
    • admin

      Hi Wilson,

      Thanks for commenting. Yes, we thought that there might be shortcuts, but the particular structure of all the processes surrounding a neuron (and there are hundreds in reality, only a few being modelled by HTM) are necessary for a fully faithful neocortex. If this was not so, these very energy-expensive processes would have been deleted by evolution. A lot of them might be just bio-engineering (my term for processes supporting the biology of cells and the chemistry of signalling, but not really critical for information processing), but we need to decide this in each and every case!

      Reply

Leave a comment