“There are only two hard things in Computer Science: cache invalidation and naming things.” – Phil Karlton
In the case of HTM, we also have the much bigger problem of explaining how neocortex may work, and how a non-obvious CLA operates to use cortical principles. Extra confusion caused by poor naming multiplies the difficulties.
A key component of the art of naming consists in identifying the scope of each name. We need to have names which are just specific enough to capture the underlying concept, but not so specific that they entangle non-essential details. Names also need to be memorable and comfortable, while not being too easy to misconstrue, because they resemble or contain words which have other meanings.
I’d like to begin a reasoned discussion about key names in HTM and CLA. The goal of the discussion is to arrive at a set of names which everyone strongly believes captures the concepts for both theory and implementation.
As a famous Supreme Court judge once said of pornography, “we cannot define it but we know it when we see it.” We are looking for this kind of name, with the added advantage that HTM can actually precisely define the concept behind each name.
Until we arrive at a good name for something (ie one which magically gets everyone’s support), we should identify the key flaws in each candidate and agree that they invalidate that candidate. This is a healthy process which should not be regarded as a criticism of any proposer.
Please treat that as an open invitation to tell me how poor my proposed names are, but only for reasons you’d accept as rational if they were directed at yours!
I’m currently re-reading the 2011 White Paper with a view to updating and improving it. This document is a very rich source of information pertinent to this discussion, and in fact appears to answer a couple of the thorniest ones! I’d very strongly recommend re-reading it as preparation for taking part in this discussion.
I’d like to go through the main named concepts one by one, discuss the strengths and weaknesses of the current names, and propose a new name for each concept with some supporting motivations and argument. I don’t expect that my proposals will stick, but they should get us a noticeable step in the right direction, or at least throw light on the relevant issues.
Sparse Distributed Representation.
I start with this one because, in my experience of learning, reasoning about, writing about, talking about, and explaining HTM, the term SDR is as close to perfect as I can imagine. It has the property of monotonically improving understanding the more you find out about each of the three concepts named.
It is also an easily testable name. We all remember when Francisco showed us the CEPT Retina SDRs, in fact they were so SDRish, some of us thought they were too good to be true!
There are several problems with this term. We understand that “spatial” was chosen to indicate that each presentation of the data has some properties and structure in the sensory domain (such as a shape, size or colour), and it’s called “spatial” as opposed to “temporal”.
A difficulty arises for newcomers who read too much into this use of the word. There is a strong temptation to rely on our commonsense ideas of space when Jeff is really talking about mathematical, vector spaces and the abstract “spaces” of SDRs.
HTM does not require the kind of retinotopic mapping found in V1. The only reason we have literal spatial layouts in just a few primary areas of sensory cortex is because it is a simpler evolutionary and developmental design, not because it is needed for the algorithm. The RDSE, the Geospatial Encoder and the CEPT retina are all superb examples of how “pseudorandom” representations are better than more pictorially understandable spatial representation regimes.
Lastly, we’ve already tripped over this when we started talking about the new sensorimotor theory. L4 cells are now dealing with motor inputs as well as “spatial”, and L3 cells are now expected to “see” a set of L4 outputs whose members are substituted over time. So the word “spatial” really needs to go.
The word “Pooling” has, for many, either no meaning at all (most cases), or worse, the wrong meanings in this context. If you are trying to capture the notion of a noise-tolerant, largely stable representation of closely related sensory input, “pooling” isn’t going to do that for most people.
I’m not sure there is a good word for this, so my suggestion drops this aspect. As mentioned several times in the 2011 White Paper, the concept of pooling (noise-tolerance, high-overlap) is already embedded as a property of the product of SP – the SDR.
I propose the term Pattern Memory for what we currently call Spatial Pooling. This captures the fact that patterns in the data are recognised-learned and that the CLA is developing a memory of patterns it has seen. By not being too specific about which patterns we mean, it also allows us to say that the CLA learns to recognise and remember patterns of input data, stores patterns of synaptic connections, and forms patterns of activation (SDRs) to represent its inputs.
This name is also robust to adopting the new theory. L4 cells can learn sensorimotor patterns, and L3 cells can learn to recognise patterns of membership in a sequence-set.
We can run this in the top-down direction too, talking about patterns appearing in L1, motor patterns, patterns of depolarisation, and so on.
(old) Temporal Pooling.
The problems with using this term in its old context have been well-rehearsed, and it’s now used for the much more appropriate concept of representing a stable(r) sequence-identifying SDR in Layer 3 when sensorimotor transitions from that sequence are occurring in Layer 4. Temporal Pooling, in that sense, is another great name.
I had previously offered the term “Transition Prediction” for the component of CLA involving lateral connections and predictive states. Jeff and Numenta are currently using “Temporal Memory”. I believe both are flawed.
My suggestion accurately captured the limited, 1-timestep scope of this component, and also the fact that prediction is the key to temporal learning. However, it sounds like we need to add words to the name, to reflect “something missing” from the two word name.
Temporal Memory, on the other hand, is too high-ranking and valuable a name for this relatively basic component. It carries the risk that people will think HTM is just a hierarchy of TMs. Also, “temporal” is too general – the same word is currently used for single-timestep (old TP/TM) all the way up to entire sequences (new TP).
I propose Transition Memory for this second core component of CLA. This captures most literally what the algorithm is doing – learning single transitions. It is also the temporal equivalent of Pattern Memory, using distal dendrites to link to past SDRs just as PM uses proximal dendrites to link to feedforward patterns.
Importantly, the term Transition Memory is not trying to work too hard. We can explain that learned transitions are used to put cells into predictive states, and that these predictive patterns are used both in sensory (variable order) and sensorimotor (first order) temporal learning. They are used to match predicted and actual inputs, detect anomalies and create patterns which indicate continuing successful prediction or trigger a pattern of bursting columns. It seems impossible to me to have one name capture all these aspects, so I propose we stop trying and give the name a break!
In a variation on Pattern Memory (SP), depolarisation due to Transition Memory is combined with feedforward inputs to assist recognition and increase noise-tolerance. In Jeff’s new sensorimotor theory, combining distal with proximal inputs is likely to be key to the function.
Old and New Versions of HTM/CLA Theory.
In previous posts, I used “old and new” or “2013 and 2014″ to distinguish these two generations of the theory. In reworking the White Paper, I’ve recognised that these two theories are akin to the Newtonian versus Relativistic or Quantum views of mechanics. You need to quite deeply understand the simpler theory before you can begin to deal with the far more complex and realistic one. And for many purposes, the simpler theory is perfectly sufficient both for understanding how the neocortex works, and for useful application in software.
I thus propose that the older, simpler theory and model be called the “Sensory Cortical Learning Algorithm” or “Sensory CLA”, the newer being called the “Sensorimotor CLA”.
SCLA (or just CLA) and SMCLA are simple, distinguishable acronyms.
This also allows us to talk about HTM systems with SCLA single-layer regions (as NuPIC can/does), which just do feedforward, sensory hierarchy, or else fuller HTMs which incorporate behaviour, stable sequences, temporal pooling, and true bidirectional hierarchy using SMCLA in each region.