Multilayer Model for Hierarchical Temporal Memory

  • Dec 17 / 2014
  • 0
Cortical Learning Algorithm

Multilayer Model for Hierarchical Temporal Memory

This post sketches a simple model for multilayer processing in Hierarchical Temporal Memory (HTM). It is based on a combination of Jeff Hawkins’ and Numenta’s current work on sensorimotor extensions to HTM, my previous ideas on efficiency of predicted sparseness as well as evidence from neuroscience.

HTM has entered a new phase of development in the past year. Hawkins and his colleagues are currently extending HTM from a single-layer sensory model (assumed to represent high-order memory in Layer 2/3 of cortex) to a sensorimotor model which involves Transition Memory of combined sensory and motor inputs in L4, which is Temporally Pooled in L2/3. Once this is successfully modelled, the plan is to examine the role of L5 and L6 in motor behaviour and feedback.

Recent research in neuroscience has significantly improved our understanding of the various pathways in cortical circuits. [Douglas & Martin, 2004] proposed a so-called canonical pathway in which thalamic inputs arrive in L4, which projects to L2/3 (which sends its output to higher regions), then to L5 (which outputs motor signals) and from there to L6 (which outputs feedback to lower layers and thalamus). Teams led by Randy Bruno [deKock et al, 2007], [Constantinople & Bruno, 2013] have found that there is also a parallel circuit thalamus-L5-[L6 and L4] as well as an L3-L4 feedback pathway.

Figure 1, which is from [deKock et al, 2007], shows the calculated temporal pattern of activity in a piece of rat barrel cortex (called D2) consisting of about 9000 neurons. Barrel cortex is so named because the neurons responsive to a single Primary Whisker (PW) form a barrel-like columnar structure in this part of rat cortex. The paper estimates the layer populations in this “column” to be 3200 L2/3, 2050 L4, 1100 L5A, 1050 L5B and 1200 L6 excitatory cells.

Figure 1. Evolution of Action Potential (AP) rates in rat barrel cortex when experimenters stimulate the associated whisker. VPM is the thalamic region which projects to this part of cortex. From [deKock et al, 2007].

We’ll examine this data from the point of view of HTM. Firstly, we see that the spontaneous activity in all layers is very sparse (0.3% in L2/3, 0.6% in L4, 1.1% in L5A, 3% in L5B and 0.5% in L6), and that activity rises and falls dramatically and differently in each layer over the 150ms following stimulation.

Looking at the first 10ms and only in L4-L2/3, we see the expected sparse activations in L4 and L3, which is followed by a dramatic increase (x17 in L4, x10 in L2/3) representing bursting in both layers, likely because the input was unpredicted. Over the next 20ms, activity in L2/3 drops sharply back to 2x the baseline, but that in L4, after 10ms of dropping, rises again to practically match the original activation. This is matched in the next 10ms by a rise in L2/3 activation, after which both levels drop gradually towards the baseline over more than 100ms. We see another, somewhat different “wavelike” response pattern in the L5/6 complex.

So, can we build a model using HTM principles which explains this data (and, even better, predicts other unseen data)? I believe there must be such a model, because we see this kind of processing everywhere we look in cortex.

Before we get to that, let’s identify some important principles which arise from our current understanding of cortical function.

I: A Race to Represent

The first principle is that a population of neurons which share a common set of inputs is driven to “best represent” its inputs using a competitive inhibition process. Each neuron is accumulating depolarising input current from a unique set of contextual and immediate sources, and the first to fire will inhibit its neighbours and form part of the representation.

Each neuron can thus be seen as analogous to a “microtheory” of its world, and it will accumulate evidence from past context, current sensory inputs, and behaviour to compete in a race for its theory to be “most true”.

II: Different Sources of Evidence

The purpose of the layered structure of neocortex is to allow each population to combine its own individual evidence sources and learn to represent the “theory” of that evidence. The various populations (or sublayers) form a cyclic graph structure of evidence flow, and they cooperate to form a stable, predictable, sensorimotor model of the current world.

III: Efficiency of Predictive Sparseness

Each neuron combines contextual or predictive inputs (on distal synapses) with evidence from immediate sources (on proximal synapses). In addition, the columnar inhibitory sheath is also racing to recognise its inputs, which come largely from the same feedforward sources as its contained pyramidal cells. The sheath has an advantage as it is a better responder [cite] to the feedforward evidence alone than any of its contained cells, so there is also a race between predictive assisted recognition and simple spatial recognition of reality.

The result of the race depends on which wins – if a single pyramidal cell wins due to high predictive depolarisation (lots of contextual evidence), then it alone will fire. Otherwise, there is a short window of time which allows some number of the most predictive cells in the column to fire in turn, before they are inhibited by a vertical process. This “bursting” encodes the difference between the reality (as signalled by this column’s inhibitory sheath firing) and the population’s prediction (as would have been signalled by a highly predictive cell in some losing nearby column).

IV: Self-stabilisation through Sparse Patterns

If we consider a cortical region in its “steady state”, we see highly sparse (non-bursting) representations everywhere, and the behavioural output (from Layer 5) will be a sequence of highly sparse patterns which result in very fine motor adjustments (or none at all). This corresponds to the region perfectly modelling the sensorimotor world it experiences and making optimal predictions with minimal corrective behaviour.

A deviation from this state (failure of prediction) leads to a partial change in representation (because reality differs from prediction) and some amount of redundant predictive representation (when several cells burst in new columns). This departure from maximal sparseness is transmitted to the downstream sublayers, causing their “view of the world” and thus their own state to change. Depending on how well each sublayer can predict these changes, the cascade may halt, or instead continue to roll around the cyclical graph of sublayers, causing behavioural side-effects as it goes.

V: A Team of Rivals – “Explaining Change” by Witnessing or Acting

Within each sublayer, some cells will have inputs which correspond to “observing” the world as it evolves on its own (by predicting from context), while others will respond better when the organism is taking certain actions, and will have learned to associate certain changes with those behaviours. The representation in each sublayer will be some mixture of these, and, in the case of motor output cells in L5, the “decisions” of the region will be those which restore the predictability of things.

The reason is simple. While the activity in the region is sparse, all the active cells are predicting their activity, and the outputs of the region reflect the happy condition. These include motor output, which by definition is acting to prolong the current status of the region (if it was acting to depart from the status, these motor cells would not be still firing).

When something changes, and a set of new neurons becomes active, new neurons become temporarily active throughout the various sublayers, but they will all be cells which have learned to respond better to the new state of the world than the previously active cells. These cells will have learned to associate their own activity with the new situation, by being more right about predicting their own activity in this new context. And this, in turn, will be true only if they are the long-term winners in the establishment of a new, stable cycle of sparse activity, or alternatively if they have regularly participated in the transition to a new stable state. Either way, the system is self-stabilising, acting to right itself and improve the prediction.

A Multilayer Cortical Model

I claim that the above principles are enough to construct a simple model of how the sublayers in a region of cortex interact and co-operate.

I use the word “sublayers” because each layer (L1-6) may contain more than one population or class of neurons. We’ll pretend these are each in their own sublayer, but recognising that there are local connections between cells in sublayers which are important to how things work.

So as not to confuse, I’ll not use the common notation for sublayers found in the literature (eg L5A), instead I’ll use labels such as L5.1, L5.2 and so on. The “minor number” will usually indicate sublayers successively “far away” from the sensorimotor inputs, both in terms of time and the number of neurons in the path to reach them. I’ll also use the deKock diagram above to anchor the place and time of each part of the response to a large sensory stimulus.

I’ll also assume the idea that when a neuron projects an axon, it does so in order to connect proximally with its target. Thus, L4 projections to L2/3 are proximal on L2/3 cells, likewise with L6 to L4, while the L2/3->L4 feedback pathway uses distal dendrites.

Layer 4.1 – Sensorimotor Transition Prediction (0-20ms)

Layer 4 is said [cite] to receive inputs from L6 (65%), elsewhere in L4 (25%), and directly from thalamus (5%). In addition, some cells in L4 have distal dendrites in L2/3. We’ll split L4 into two sublayers, depending on whether they receive inputs from L2/3 (L4.1 no, L4.2 yes). Some researchers [cite] divide L4 into two populations – stellar cells and pyramidal cells, and it may be that the split is along these lines.

My hypothesis is that L4.1 cells are making predictions of sensorimotor transitions, using thalamic sensorimotor input as (primarily) feedforward, and a combination of local predictive context (L4) and information about the region’s current sensorimotor output (from L6). I say “primarily” because a single feedforward axon could synapse with a cell both on its proximal and distal dendrites, and this would be even more important for the stellar dendritic branches of L4.1 cells.

Note that the L4 inputs to L4.1 includes evidence of the output of L2/3 (a more stable “sensory” representation) via L4.2. The L6-sourced inputs also include evidence of the stable feedback pattern being sent to lower regions, which are themselves indirectly influenced by L5’s use of L2/3 (see later).

So, L4.1 is receiving fast-changing sensorimotor inputs, along with slower-changing context from within L4, and both sensory and motor outputs of the region. It uses whatever best evidence it has to predict any transitions in the thalamic input.

Successful prediction in L4.1 results in it outputting a highly sparse pattern on each transition. Failures in prediction are encoded as a union of “nearly predicted” cell activations in the columns best recognising the unpredicted thalamic input.

This might not seem sensible when thalamic inputs are only 5% of what L4.1 is receiving, but remember that the other inputs are usually highly sparse (1-2%) and change much more slowly, so thalamic feedforward input to L4.1 acts as a tiebreaker among predictions. This pattern is repeated throughout cortex because bursting cells cause a similar disruptive, temporary tiebreaking signal in downstream sublayers.

Layers 3.1 and 2.1 – Temporal Pooling (10-20ms)

Layers 2 and 3 are usually treated as one. Both receive most of their feedforward input from L4 and have distal inputs both from within L2/3 and from L1 (which gets feedback input from L6 in higher regions).

I’ll split the two by saying that L2 gets more input from L1 than L3 does. In other words, L2 is more primed or biased by higher-level context, while L3 is less likely to be dominated by feedback. There is evidence [cite] of this differentiation, so let’s assume it’s useful.

Now, L2.1/L3.1 are receiving feedforward inputs from L4.1. If those inputs are sparse, then only those cells in L2/3 which have many active inputs will be part of the SDR in this layer (it’s one layer in a column sense, just the L2 “end” has a higher L1 input mix). In addition, they’ll need good intralayer and/or top-down predictive input to maintain stable activity.

The stability in L2.1/3.1 comes from the combination of stable predictive inputs from within the layer and from above. This prebiases predictive cells to recognise the successive sparse inputs from L4.1 and continue to remain active. The active cells in L2/3 have learned to use a combination of sequence memory (intralayer) and top-down feedback to associate with each fast-changing SDR in L4.1. This mechanism is reinforced by the fast L4.1-L2/3.1-L4.2-L4.1 feedback loop, along with the much longer feedback loops.

This is where the L2/3 difference is important. The more superficial cells in L2/3 are more strongly biased by top-down feedback from L1. We have evidence [cite] that L2 projects more strongly to the deep part of L5, while L3 projects more to superficial L5. Thus, the choices of active cells in L2/3 encode how much sequence memory and how much top-down are involved in the representation.

L6.1 – Comparing Reality with Expectations from Behaviour (0-10ms)

[Constantinople and Bruno], among others, show that direct thalamic inputs arrive simultaneously at L4 and L5/L6, suggesting that L5/6 and L4/L2/3 are performing parallel operations on sensorimotor inputs. While the L4-L2/3 system is relatively simple (at least at first order approximation), the L5/6 system is much more complex, involving a larger number of functional populations with diverse purposes. I’ll describe a minimum of these for now.

Layer 6.1 cells are the first in L5/6 to respond to thalamic inputs, suggesting a role analogous to L4.1. Unlike L4 cells, however, these cells have immediate access to both the recent L6 output to lower regions (representing the current steady state of the region) and the current motor output of the region (from L5). This much richer set of evidence sources allows L6.1 to make finer-grained predictions of the expected thalamic inputs, and its response when prediction fails is the primary driver for changes in L5 motor output and signals to higher regions.

L5.1 – Responding to Change by Acting (0-20ms)

I speculate that the thick-tufted L5B cells correspond to L5.1 in my model. These cells also receive direct thalamic inputs, as well as inputs from L6, L2/3 (primarily the L2 “end”) and top-down feedback via L1. L5.1’s purpose is to act quickly if necessary, in response to a significant change in its world. Any dramatic change in either sensorimotor patterns or context will cause L5 to output a large, non-sparse signal which it has learned is appropriate to that change.

In the steady state, with all inputs sparse, L5.1 generates a minimal, sparse signal which corresponds to energetically efficient, smooth behaviour in the organism. Sudden (unpredictable) changes in either sensorimotor inputs (thalamic), correspondence between behaviour and outcomes (L6), sequence memory predictions (L2/3) or top-down “instructions” (L1) will cause a dramatic rise in output (from 3% to over 10% active cells) which results in new corrective motor behaviour as well as an alarm signal to higher layers.

L6.2 – Co-ordination of Responses (10-30ms)

In Layer 6, a second population of cells is responsible for integrating any rising activity in L5.1 with context, signalling L4 of the new situation, and affecting the L6 feedback output. The better L6.2 can predict/recognise the output of L5, the sparser its signal to L4 and the smaller the effect on L6 feedback output. Thus, L6.2 acts either to help L4 make good predictions of transitions (by sending sparse signals), or to disrupt steady-state prediction in L4 (and later L2/3) into a new sensorimotor regime.

L4.2 and L2/3: Stabilising Prediction (30-50ms)

After 30ms or so, pyramidal cells in L4 are sampling the “sensory” response of L2/3 along with signals from L6 about the motor response. L4.2 can now generate a signal for L2/3 which is more sparse than the initial L4.1 response, but still well above baseline. Over the next 20-50ms, L4.2 and L2/3 use this feedback loop (along with the L5/6 motor loop) to reduce their activity and settle into a steady predictive state.

I propose that it is these L4.2 cells which participate in the steady-state activity of L4, along with the L5.2 cells (next section). L4.1 and L5.1 are representative of large transitions between steady, predictive sparse states.

L5.2 and L6 – Stabilising Behaviour (40-50ms)

L5.2, which corresponds to thick-tufted cells in L5A (in deKock’s diagram). This sublayer combines the context inputs (from L6, L1 and L5) with the lagging, stabilising output from L2/3 (which is being stabilised by the L4.2 feedback loop) and produces a second motor response (and a second signal to higher layers). With more information about how L2/3 responded to the initial signal, L5.2 can learn to produce a more nuanced behaviour than the “knee-jerk” response of L5.1, or perhaps counteract it to resume stability.

L6 is again used to provide feedback of behaviour to L4 and aid its prediction.

Multilayer CLA

Figure 2: Schematic showing main connections in the multilayer model. Each “neuron” represents a large number of neurons in each sublayer.

ppMultilayer Flow Diagram

Figure 3: Schematic showing main axonal (arrows) and dendritic (tufts) links in the multilayer model.


We can see how this model allows a region of cortex to go from a highly sparse, quiescent steady state, absorb a large sensory stimulus, and respond, initially with dramatic changes in activity, then with decreasing waves of disturbance and motor response, in order to restore a new steady state which is self-sustaining.

The fast-responding L4.1 and L5.1 cells react first to a drastic change, causing representations in L2/3 and L6 to update, and then the second population, using L4.2 to stabilise perception and L5.2 to stabilise behaviour, takes over and settles into a new steady state.


Apart from the rat barrel cortex example used here, we can see how this model can be applied in other well-studied cortical systems.

Microsaccades Stabilise Vision in V1

In V1, the primary thalamic input is from retinal ganglion cells which detect on-centre or off-centre patterns in the retinal image. L4 cells are understood [cite] mostly to contain so-called “simple cells” which respond to short oriented “bars” formed by a small number of neighbouring ganglion cells. L2/3, by the same token, contains many more “complex” cells which respond to overlapping or moving bars corresponding to longer edges or a sequence of edge movements. L4 also contains a smaller number of cells with these response properties.

I propose that the simple cells are L4.1, while the L2/3 complex cells are temporally poling over these cells, and the second population of L4 complex cells are actually L4.2, responding to the activity in L2/3. L5 in steady state is causing the eye to microsaccade in order to stabilise the “image” formed in L2/3 of the edges in the scene as tiny movements of organism and objects cause the exact patterns in L4.1 to change predictably.

Deviations beyond the microsaccade scale will cause bursting in L4.1, and the SDR shown by L2/3 will change to a new one representing the new sensory input. If L2/3 can use L1 and its own predictive input to correctly expect this new state, it will remain sparse and cause minimal reaction in L5 (in the second phase). If not, L2/3 will burst, L5 will generate a large signal, and thus V1 will pass the buck up to a region which can deal with changes of scene.

This process will be repeated at higher levels, at higher temporal and spatial scales.

Speech Generation

In speech generation, the sensory input is from the ears, and the motor output is to the vocal system. The region responsible for generating speech is controlled (via L1) by higher regions expressing a high-level representation of sounds to be produced. Layer 2/3 uses this input to bias itself to represent all sequences of sounds which match the L1 signal. Layer 5 receives both these signals and is thus highly predictive of representing the motor actions for these sequences. Since all the sublayers are at non-zero sparseness, activity will propagate and be amplified at each stage by the predictive states until a “most probable” starting sound is generated. The region will continue to generate the correct motor activity, using prediction to correct for differences between the expected and perceived sounds.

Citations (to be completed)

Constantinople, Christine M. and Bruno, Randy M.: Deep Cortical Layers Are Activated Directly by Thalamus. Science 28 June 2013: Vol. 340 no. 6140 pp. 1591-1594 DOI: 10.1126/science.1236425 [Abstract Free]

Douglas, Rodney J. and Martin, Kevan A.C.: Neuronal Circuits of the Neocortex, Annu. Rev. Neurosci. 2004. 27:419–51 doi:10.1146/annurev.neuro.27.070203.144152 [Google Scholar]

[Abstract/Full Text]



Leave a comment