Monthly Archives / November 2014

  • Nov 29 / 2014
  • 0
Clortex (HTM in Clojure), Cortical Learning Algorithm, NuPIC

Mathematics of HTM Part II – Transition Memory

This article is part of a series describing the mathematics of Hierarchical Temporal Memory (HTM), a theory of cortical information processing developed by Jeff Hawkins. In Part One, we saw how a layer of neurons learns to form a Sparse Distributed Representation (SDR) of an input pattern. In this section, we’ll describe the process of learning temporal sequences.

We showed in part one that the HTM model neuron learns to recognise subpatterns of feedforward input on its proximal dendrites. This is somewhat similar to the manner by which a Restricted Boltzmann Machine can learn to represent its input in an unsupervised learning process. One distinguishing feature of HTM is that the evolution of the world over time is a critical aspect of what, and how, the system learns. The premise for this is that objects and processes in the world persist over time, and may only display a portion of their structure at any given moment. By learning to model this evolving revelation of structure, the neocortex can more efficiently recognise and remember objects and concepts in the world.

Distal Dendrites and Prediction

In addition to its one proximal dendrite, a HTM model neuron has a collection of distal (far) neurons, which gather information from sources other than the feedforward inputs to the layer. In some layers of neocortex, these dendrites combine signals from neurons in the same layer as well as from other layers in the same region, and even receive indirect inputs from neurons in higher regions of cortex. We will describe the structure and function of each of these.

The simplest case involves distal dendrites which gather signals from neurons within the same layer.

In Part One, we showed that a layer of \(N\) neurons converted an input vector \(\mathbf x \in \mathbb{B}^{n_{\textrm{ff}}}\) into a SDR \(\mathbf{y}_{\textrm{SDR}} \in \mathbb{B}^{N}\), with length\(\lVert{\mathbf y}_{\textrm{SDR}}\rVert_{\ell_1}=sN \ll N\), where the sparsity \(s\) is usually of the order of 2% (\(N\) is typically 2048, so the SDR \(\mathbf{y}_{\textrm{SDR}}\) will have 40 active neurons).

The layer of HTM neurons can now be extended to treat its own activation pattern as a separate and complementary input for the next timestep. This is done using a collection of distal dendrite segments, which each receive as input the signals from other neurons in the layer itself. Unlike the proximal dendrite, which transmits signals directly to the neuron, each distal dendrite acts as an active coincidence detector, firing only when it receives enough signals to exceed its individual threshold.

We proceed with the analysis in a manner analogous to the earlier discussion. The input to the distal dendrite segment \(k\) at time \(t\) is a sample of the bit vector \(\mathbf{y}_{\textrm{SDR}}^{(t-1)}\). We have \(n_{ds}\) distal synapses per segment, a permanence vector \(\mathbf{p}_k \in [0,1]^{n_{ds}}\) and a synapse threshold vector \(\vec{\theta}_k \in [0,1]^{n_{ds}}\), where typically \(\theta_i = \theta = 0.2\) for all synapses.

Following the process for proximal dendrites, we get the distal segment’s connection vector \(\mathbf{c}_k\):

$$c_{k,i}=(1 + sgn(p_{k,i}-\theta_{k,i}))/2$$

The input for segment \(k\) is the vector \(\mathbf{y}_k^{(t-1)} = \phi_k(\mathbf{y}_{\textrm{SDR}}^{(t-1)})\) formed by the projection \(\phi_k:\lbrace{0,1}\rbrace^{N-1}\rightarrow\lbrace{0,1}\rbrace^{n_{ds}}\) from the SDR to the subspace of the segment. There are \({N-1}\choose{n_{ds}}\) such projections (there are no connections from a neuron to itself, so there are \(N-1\) to choose from).

The overlap of the segment for a given \(\mathbf{y}_{\textrm{SDR}}^{(t-1)}\) is the dot product \(o_k^t = \mathbf{c}_k\cdot\mathbf{y}_k^{(t-1)}\). If this overlap exceeds the threshold \(\lambda_k\) of the segment, the segment is active and sends a dendritic spike of size \(s_k\) to the neuron’s cell body.

This process takes place before the processing of the feedforward input, which allows the layer to combine contextual knowledge of recent activity with recognition of the incoming feedforward signals. In order to facilitate this, we will change the algorithm for Pattern Memory as follows.

Each neuron begins a timestep \(t\) by performing the above processing on its \({n_{\textrm{dd}}}\) distal dendrites. This results in some number \(0\ldots{n_{\textrm{dd}}}\) of segments becoming active and sending spikes to the neuron. The total predictive activation potential is given by:

$$o_{\textrm{pred}}=\sum\limits_{o_k^{t} \ge \lambda_k}{s_k}$$

The predictive potential is combined with the overlap score from the feedforward overlap coming from the proximal dendrite to give the total activation potential:

$$a_j^t=\alpha_j o_{\textrm{ff},j} + \beta_j o_{\textrm{pred},j}$$

and these \(a_j\) potentials are used to choose the top neurons, forming the SDR \(Y_{\textrm{SDR}}\) at time \(t\). The mixing factors \(\alpha_k\) and \(\beta_k\) are design parameters of the simulation.

Learning Predictions

We use a very similar learning rule for distal dendrite segments as we did for the feedforward inputs:

$$ p_i^{(t+1)} =
\begin{cases}
(1+\sigma_{inc})p_i^{(t)} & \text {if cell $j$ active, segment $k$ active, synapse $i$ active} \\
(1-\sigma_{dec})p_i^{(t)} & \text {if cell $j$ active, segment $k$ active, synapse $i$ not active} \\
p_i^{(t)} & \text{otherwise} \\
\end{cases} $$

Again, this reinforces synapses which contribute to activity of the cell, and decreases the contribution of synapses which don’t. A boosting rule, similar to that for proximal synapses, allows poorly performing distal connections to improve until they are good enough to use the main rule.

Interpretation

We can now view the layer of neurons as forming a number of representations at each timestep. The field of predictive potentials \(o_{\textrm{pred},j}\) can be viewed as a map of the layer’s confidence in its prediction of the next input. The field of feedforward potentials can be viewed as a map of the layer’s recognition of current reality. Combined, these maps allow for prediction-assisted recognition, which, in the presence of temporal correlations between sensory inputs, will improve the recognition and representation significantly.

We can quantify the properties of the predictions formed by such a layer in terms of the mutual information between the SDRs at time \(t\) and \(t+1\). I intend to provide this analysis as soon as possible, and I’d appreciate the kind reader’s assistance if she could point me to papers which might be of help.

A layer of neurons connected as described here is a Transition Memory, and is a kind of first-order memory of temporally correlated transitions between sensory patterns. This kind of memory may only learn one-step transitions, because the SDR is formed only by combining potentials one timestep in the past with current inputs.

Since the neocortex clearly learns to identify and model much longer sequences, we need to modify our layer significantly in order to construct a system which can learn high-order sequences. This is the subject of the next part of this series.

Note: For brevity, I’ve omitted the matrix treatment of the above. See Part One for how this is done for Pattern Memory; the extension to Transition Memory is simple but somewhat arduous.

  • Nov 28 / 2014
  • 0
Clortex (HTM in Clojure), Cortical Learning Algorithm, NuPIC

Mathematics of Hierarchical Temporal Memory

This article describes some of the mathematics underlying the theory and implementations of Jeff Hawkins’ Hierarchical Temporal Memory (HTM), which seeks to explain how the neocortex processes information and forms models of the world.

Note: Part II: Transition Memory is now available.

The HTM Model Neuron – Pattern Memory (aka Spatial Pooling)

We’ll illustrate the mathematics of HTM by describing the simplest operation in HTM’s Cortical Learning Algorithm: Pattern Memory, also known as Spatial Pooling, forms a Sparse Distributed Representation from a binary input vector. We begin with a layer (a 1- or 2-dimensional array) of single neurons, which will form a pattern of activity aimed at efficiently representing the input vectors.

Feedforward Processing on Proximal Dendrites

The HTM model neuron has a single proximal dendrite, which is used to process and recognise feedforward or afferent inputs to the neuron. We model the entire feedforward input to a cortical layer as a bit vector \({\mathbf x}_{\textrm{ff}}\in\lbrace{0,1}\rbrace^{n_{\textrm{ff}}}\), where \(n_{\textrm{ff}}\) is the width of the input.

The dendrite is composed of \(n_s\) synapses which each act as a binary gate for a single bit in the input vector.  Each synapse has a permanence \(p_i\in{[0,1]}\) which represents the size and efficiency of the dendritic spine and synaptic junction. The synapse will transmit a 1-bit (or on-bit) if the permanence exceeds a threshold \(\theta_i\) (often a global constant \(\theta_i = \theta = 0.2\)). When this is true, we say the synapse is connected.

Each neuron samples \(n_s\) bits from the \(n_{\textrm{ff}}\) feedforward inputs, and so there are \({n_{\textrm{ff}}}\choose{n_{s}}\) possible choices of input for a single neuron. A single proximal dendrite represents a projection \(\pi_j:\lbrace{0,1}\rbrace^{n_{\textrm{ff}}}\rightarrow\lbrace{0,1}\rbrace^{n_s}\), so a population of neurons corresponds to a set of subspaces of the sensory space. Each dendrite has an input vector \({\mathbf x}_j=\pi_j({\mathbf x}_{\textrm{ff}})\) which is the projection of the entire input into this neuron’s subspace.

A synapse is connected if its permanence \(p_i\) exceeds its threshold \(\theta_i\). If we subtract \({\mathbf p}-{\vec\theta}\), take the elementwise sign of the result, and map to \(\lbrace{0,1}\rbrace\), we derive the binary connection vector \({\mathbf c}_j\) for the dendrite. Thus:

$$c_i=(1 + sgn(p_i-\theta_i))/2$$

The dot product \(o_j({\mathbf x})={\mathbf c}_j\cdot{\mathbf x}_j\) now represents the feedforward overlap of the neuron with the input, ie the number of connected synapses which have an incoming activation potential. Later, we’ll see how this number is used in the neuron’s processing.

The elementwise product \({\mathbf o}_j={\mathbf c}_j\odot{\mathbf x}_j\) is the vector in the neuron’s subspace which represents the input vector \({\mathbf x}_{\textrm{ff}}\) as “seen” by this neuron. This is known as the overlap vector. The length \(o_j = \lVert{\mathbf o}_j\rVert_{\ell_1}\) of this vector corresponds to the extent to which the neuron recognises the input, and the direction (in the neuron’s subspace) is that vector which has on-bits shared by both the connection vector and the input.

If we project this vector back into the input space, the result \(\mathbf{\hat{x}}_j =\pi^{-1}({\mathbf o}_j)\) is this neuron’s approximation of the part of the input vector which this neuron matches. If we add a set of such vectors, we will form an increasingly close approximation to the original input vector as we choose more and more neurons to collectively represent it.

Sparse Distributed Representations (SDRs)

We now show how a layer of neurons transforms an input vector into a sparse representation. From the above description, every neuron is producing an estimate \(\mathbf{\hat{x}}_j \) of the input \({\mathbf x}_{\textrm{ff}}\), with length \(o_j\ll n_{\textrm{ff}}\) reflecting how well the neuron represents or recognises the input. We form a sparse representation of the input by choosing a set \(Y_{\textrm{SDR}}\) of the top \(n_{\textrm{SDR}}=sN\) neurons, where \(N\) is the number of neurons in the layer, and \(s\) is the chosen sparsity we wish to impose (typically \(s=0.02=2\%\)).

The algorithm for choosing the top \(n_{\textrm{SDR}}\) neurons may vary. In neocortex, this is achieved using a mechanism involving cascading inhibition: a cell firing quickly (because it depolarises quickly due to its input) activates nearby inhibitory cells, which shut down neighbouring excitatory cells, and also nearby inhibitory cells, which spread the inhibition outwards. This type of local inhibition can also be used in software simulations, but it is expensive and is only used where the design involves spatial topology (ie where the semantics of the data is to be reflected in the position of the neurons). A more efficient global inhibition algorithm – simply choosing the top \(n_{\textrm{SDR}}\) neurons by their depolarisation values – is often used in practise.

If we form a bit vector \({\mathbf y}_{\textrm{SDR}}\in\lbrace{0,1}\rbrace^N\textrm{ where } y_j = 1 \Leftrightarrow j \in Y_{\textrm{SDR}}\), we have a function which maps an input \({\mathbf x}_{\textrm{ff}}\in\lbrace{0,1}\rbrace^{n_{\textrm{ff}}}\) to a sparse output \({\mathbf y}_{\textrm{SDR}}\in\lbrace{0,1}\rbrace^N\), where the length of each output vector is \(\lVert{\mathbf y}_{\textrm{SDR}}\rVert_{\ell_1}=sN \ll N\).

The reverse mapping or estimate of the input vector by the set \(Y_{\textrm{SDR}}\) of neurons in the SDR is given by the sum:

$$\mathbf{\hat{x}} = \sum\limits_{j \in Y_{\textrm{SDR}}}{{\mathbf{\hat{x}}}_j} = \sum\limits_{j \in Y_{\textrm{SDR}}}{\pi_j^{-1}({\mathbf o}_j)} = \sum\limits_{j \in Y_{\textrm{SDR}}}{\pi_j^{-1}({\mathbf c}_j\odot{\mathbf x}_j)}= \sum\limits_{j \in Y_{\textrm{SDR}}}{\pi_j^{-1}({\mathbf c}_j \odot \pi_j({\mathbf x}_{\textrm{ff}}))}= \sum\limits_{j \in Y_{\textrm{SDR}}}{\pi_j^{-1}({\mathbf c}_j) \odot {\mathbf x}_{\textrm{ff}}} $$

Matrix Form

The above can be represented straightforwardly in matrix form. The projection \(\pi_j:\lbrace{0,1}\rbrace^{n_{\textrm{ff}}} \rightarrow\lbrace{0,1}\rbrace^{n_s} \) can be represented as a matrix \(\Pi_j \in \lbrace{0,1}\rbrace^{{n_s} \times\ n_{\textrm{ff}}} \).

Alternatively, we can stay in the input space \(\mathbb{B}^{n_{\textrm{ff}}}\), and model \(\pi_j\) as a vector \(\vec\pi_j =\pi_j^{-1}(\mathbf 1_{n_s})\), ie where \(\pi_{j,i} = 1 \Leftrightarrow (\pi_j^{-1}(\mathbf 1_{n_s}))_i = 1\).

The elementwise product \(\vec{x_j} =\pi_j^{-1}(\mathbf x_{j}) = \vec{\pi_j} \odot {\mathbf x_{\textrm{ff}}}\) represents the neuron’s view of the input vector \(x_{\textrm{ff}}\).

We can similarly project the connection vector for the dendrite by elementwise multiplication: \(\vec{c_j} =\pi_j^{-1}(\mathbf c_{j}) \), and thus \(\vec{o_j}(\mathbf x_{\textrm{ff}}) = \vec{c_j} \odot \mathbf{x}_{\textrm{ff}}\) is the overlap vector projected back into \(\mathbb{B}^{n_{\textrm{ff}}}\), and the dot product \(o_j(\mathbf x_{\textrm{ff}}) = \vec{c_j} \cdot \mathbf{x}_{\textrm{ff}}\) gives the same overlap score for the neuron given \(\mathbf x_{\textrm{ff}}\) as input. Note that \(\vec{o_j}(\mathbf x_{\textrm{ff}}) =\mathbf{\hat{x}}_j \), the partial estimate of the input produced by neuron \(j\).

We can reconstruct the estimate of the input by an SDR of neurons \(Y_{\textrm{SDR}}\):

$$\mathbf{\hat{x}}_{\textrm{SDR}} = \sum\limits_{j \in Y_{\textrm{SDR}}}{{\mathbf{\hat{x}}}_j} = \sum\limits_{j \in Y_{\textrm{SDR}}}{\vec o}_j = \sum\limits_{j \in Y_{\textrm{SDR}}}{{\vec c}_j\odot{\mathbf x_{\textrm{ff}}}} = {\mathbf C}_{\textrm{SDR}}{\mathbf x_{\textrm{ff}}}$$

where \({\mathbf C}_{\textrm{SDR}}\) is a matrix formed from the \({\vec c}_j\) for \(j \in Y_{\textrm{SDR}}\).

Optimisation Problem

We can now measure the distance between the input vector \(\mathbf x_{\textrm{ff}}\) and the reconstructed estimate \(\mathbf{\hat{x}}_{\textrm{SDR}}\) by taking a norm of the difference. Using this, we can frame learning in HTM as an optimisation problem. We wish to minimise the estimation error over all inputs to the layer. Given a set of (usually random) projection vectors \(\vec\pi_j\) for the N neurons, the parameters of the model are the permanence vectors \(\vec{p}_j\), which we adjust using a simple Hebbian update model.

The update model for the permanence of a synapse \(p_i\) on neuron \(j\) is:

$$ p_i^{(t+1)} =
\begin{cases}
(1+\delta_{inc})p_i^{(t)} & \text {if $j \in Y_{\textrm{SDR}}$, $(\mathbf x_j)_i=1$, and $p_i^{(t)} \ge \theta_i$} \\
(1-\delta_{dec})p_i^{(t)} & \text {if $j \in Y_{\textrm{SDR}}$, and ($(\mathbf x_j)_i=0$ or $p_i^{(t)} \lt \theta_i$)} \\
p_i^{(t)} & \text{otherwise} \\
\end{cases} $$

This update rule increases the permanence of active synapses, those that were connected to an active input when the cell became active, and decreases those which were either disconnected or received a zero when the cell fired. In addition to this rule, an external process gently boosts synapses on cells which either have a lower than target rate of activation, or a lower than target average overlap score.

I do not yet have the proof that this optimisation problem converges, or whether it can be represented as a convex optimisation problem. I am confident such a proof can be easily found. Perhaps a kind reader who is more familiar with a problem framed like this would be able to confirm this. I’ll update this post with more functions from HTM in coming weeks.

Note: Part II: Transition Memory is now available.

  • Nov 13 / 2014
  • 0
Cortical Learning Algorithm, NuPIC

Efficiency of Predicted Sparseness as a Motivating Model for Hierarchical Temporal Memory

Part 1 – Introduction and Description.

In any attempt to create a theoretical scientific framework, breakthroughs are often made when a single key “law” is found to underly what previously appeared to be a number of observed lesser laws. An example from Physics is the key principle of Relativity: that the speed of light is a constant in all inertial frames of reference, which quickly leads to all sorts of unintuitive phenomena like time dilation, length contraction, and so on. This discussion aims to do this for HTM by proposing that its key underlying principle is the efficiency of predicted sparseness at all levels. I’ll attempt to show how this single principle not only explains several key features of HTM identified so far, but also explains in detail how to model any required structural component of the neocortex.

The neocortex is a tremendously expensive organ in mammals, and particularly in humans, so it seems certain that the benefits it provides are proportionately valuable to the genes of an animal. We can use this relationship between cost and benefit, with sparseness and prediction as mediating metrics, to derive detailed design rules for the neocortex at every level, down to individual synapses and their protein machinery.

If you take one thing away from this talk, it should be that Sparse Distributed Representations are the key to Intelligence. Jeff Hawkins

Note: The next post in this series describes the Mathematics of Hierarchical Temporal Memory.

Sparse Distributed Representations are a key concept in HTM theory. In any functional piece of cortex, only a small fraction of a large population of neurons will be active at a given time; each active neuron encodes some component of the semantics of the representation; and small changes in the exact SDR correspond with small differences in the detailed object or concept being represented. Ahmad 2014 describes many important properties of SDRs.

SDRs are one efficient solution to the problem of representing something with sufficient accuracy at optimal cost in resources, and in the face of ambiguity and noise. My thesis is that in forming SDRs, neocortex is striving to optimise a lossy compression process by representing only those elements of the input which are structural and ignoring everything else.

Shannon proposed that any message has a concrete amount of information, measured in bits, which reflects the amount of surprise (i.e. something you couldn’t compute from the message so far, or by other means) contained in the message.

The most efficient message has zero length – it’s the message you don’t need to send. The next most efficient message contains only the information the receiver lacks to reconstruct everything the sender wishes her to know. Thus, by using memory and the right encoding to connect with it, a clever receiver (or memory system) can become very efficient indeed.

We will see that neocortex implements this idea literally, at all levels, as it attempts to represent, remember and predict events in the world as usefully as possible and at minimal cost.

The organising principle in cortical design is that components (from the whole organism down to a synapse) can do little about the amount of signal they receive, but they can – and do – adapt and learn to make best use of that signal to control what they do, only acting – sending a signal – when it’s the predicted optimal choice. This gives rise to sparseness in space and time everywhere, which directly reflects the degree of successful prediction present in any part of the system.

The success metric for a component in neocortex is the ratio of input data rate to output information rate, where the component has either a fixed minimum, or (for neurons and synapses) a fixed maximum, output level.

Deviations from the target indicate some failure to predict activity. This failure is either an opportunity to learn (and predict better next time), or, failing that, something which needs to be acted upon in some other way, by taking a different action or by passing new information up the hierarchy.

Note inputs in this context are any kind of signal coming in to the component under study. In the case of regions, layers and neurons, these include top-down feedback and lateral inputs as well as feedforward.

Hierarchy

Neocortex is a hierarchy because it has finite space to store its model of the world, and a hierarchy is an optimal strategy when the world itself has hierarchical structure. Each region in the hierarchy is subjected (by design) to a necessarily overwhelming rate of input, it will run at capacity to absorb its data stream, reallocating its finite resources to contain an optimal model of the world it perceives.

Regions

The memory inside a region of cortex is driven towards an “ideal” state in which it always predicts its inputs and thus produces a “perfect”, minimal message – containing its learned SDR of its world’s current state – as output. Any failure to predict is indicated by a larger output, the deviation from “ideal” representing the exact surprise of the region to its current perception of the world.

A region has several output layers, each of which has a different (and usually more than one) purpose.

For each region, two layers send (different) signals up the hierarchy, therefore signalling both the current state of its world and the encoding of its unpredictability. The higher region now gets details of something it should hopefully have the capacity to handle – predict – or else it passes the problem up the chain.

Two layers send (again different) signals down to lower layers and (in the case of motor) to subcortical systems. The content of these outputs will relate to the content as well as the stability and confidence of the region’s model, and also actions which are appropriate in terms of that content and confidence level.

Layers

A cortical layer which has fully predicted its inputs has a maximally sparse output pattern. A fully failing prediction pattern in a layer causes it to output a maximally bursting and minimally sparse pattern, at least for a short time. At any failure level in between, the exact evolution of firing in the bursting neurons encodes the precise pattern of prediction failure of the layer, and this is the information passed to other layers in the region, to other regions in cortex, or to targets outside the cortex.

The output of a cortical layer is thus a minimal message – it “starts” with the best match of its prediction and reality, followed (in a short period of time) by encodings of reality in the context of increasingly weak prediction.

Columns

A layer’s output, in turn, is formed from the combination of its neurons, which are themselves arranged in columns. The columnar arrangement of cells in cortical columns is the key design leading to all the behaviour described previously.

Pyramidal cells, which represent both the SDR activity pattern and the “memory” in a layer, are all contained in columns. The sparse pattern of activity across a layer is dictated by how all the cells compete within this columnar array.

Columns are composed of pyramidal cells, which act independently, and a complex of inhibitory cells which act together to define how the column operates. All cells share a very similar feedforward receptive field, due to the fact that feedforward axons physically run up through the narrow column and abut the pyramidal bodies as they squeeze past.

Columnar Inhibition

The inhibitory cells have a broader and faster feedforward response compared with the pyramidal cells Reference so, in the absence of strong predictive inputs to any pyramidal cells, the entire assemblage of inhibitory neurons will be first to fire in a column. When this happens, these inhibitory cells excite those in adjacent columns, and a wave of inhibition spreads out from a successfully firing column.

The wave continues until it arrives at a column which has already been inhibited by a wave coming from elsewhere in the layer (from some recently active column). This gives rise to a pattern of inactivity around columns which are currently active.

Predictive Activation

Each cell in a column has its own set of feedforward and predictive inputs, so every cell has a different rate of depolarising as it is driven towards firing threshold.

Some cells may have received sufficient depolarising input from predictive lateral or top-down dendrites to reach firing threshold before the column’s sheath of inhibitory cells. In this case the pyramidal cell will fire first, trigger the column’s inhibitory sheath, and cause the wave of inhibition to spread out laterally in the layer.

Vertical Inhibition in Columns

When the inhibitory sheath fires, it also sends a wave of inhibitory signals vertically in the column. This wave will shut down any pyramidal cells which have not yet reached threshold, giving rise to a sparse activity pattern in the column.

The exact number of cells which get to fire before the sheath shuts them down depends mainly on how predictive each cell was and whether the sheath was triggered by a “winning cell” (previous section), by the sheath being first to fire, or as a result of neighbouring columns sending out signals.

If there is a wave of inhibition reaching a column, all cells are shut down and none (or no more) fire.

If there was a cell so predictive that it fired before the sheath, all other cells are very likely shut down and only one cell fires.

Finally, if the sheath was first to fire due to its feedforward input, the pyramidal cells are shut down quite quickly, but the most predictive may get the chance to fire just before being shut down.

This last process is called bursting, and gives rise to a short-lived pattern which encodes exactly how well the column as an ensemble has matched its predictions. Basically, the more cells which fire, the more “confused” the match between prediction and reality. This is because the inhibition happens quickly, so the gap between the first and last cell to burst must be small, reflecting similar levels of predictivity.

The bursting process may also be ended by an incoming wave of inhibition. The further away a competing column is, the longer that will take, allowing more cells to fire and extending the burst. Thus the amount of bursting also reflects the local area’s ability to respond to the inputs.

Neurons

Neurons are machines which use patterns of input signals to produce a temporal pattern of output signal. The neuron wastes most resources if its potential rises but just fails to fire, so the processes of adaption of the neuron are driven to a) maximise the response to inputs within a particular set, and b) minimise the response to inputs outside that set.

The set of excitatory inputs to one neuron are of two main types – feedforward and predictive; the number of each type of input varies from 10’s to 10’s of thousand; and the inputs arrive stochastically in combinations which contain mixtures of true structure and noise, so the “partitioning problem” a neuron faces is intractable. It simply learns to do the best it can.

Note that neurons are the biggest components in HTM which actually do anything! In fact, the regions, layers and columns are just organisational constructs, ways of looking at the sets of interacting neurons.

The neuron is the level in the system at which genetic control is exercised. The neuron’s shape, size, position in the neocortex, receptor selections, and many more things are decided per-neuron.

Importantly, many neurons have a genetically expressed “firing program” which broadly sets a target for the firing pattern, frequency and dependency setup.

Again, this gives the neuron an optimal pattern of output, and its job is to arrange its adaptations and learn to match that output.

Dendrites

Distal dendrites have a similar but simpler and smaller scale problem of combining inputs and deciding whether to spike.

I don’t believe dendrites do much more than passively respond to global factors such as modulators and act as conduits for signals, both electrical and chemical, originating in synapses.

Synapses

Synapses are now understood to be highly active processing components, capable of growing both in size and efficiency in a few seconds, actively managing their response to multiple inputs – presynaptic, modulatory and intracellular, and self-optimising to best correlate a stream of incoming signals with the activity of the entire neuron.

Part Two takes this idea further and details how a multilayer region uses the efficiency of predicted sparseness to learn a sensorimotor model and generate behaviour.

The next post in this series describes the Mathematics of Hierarchical Temporal Memory. This diversion is useful before proceeding with the main thread.

Blättler F, Hahnloser RHR. An Efficient Coding Hypothesis Links Sparsity and Selectivity of Neural Responses. Kiebel SJ, ed. PLoS ONE 2011;6(10):e25506. doi:10.1371/journal.pone.0025506. [Full Text]