[Note: This post is a sketch of a paper in progress, due to be completed in May-June 2015.]
I believe we have now discovered the key function of neocortex: it is a machine which uses sensorimotor information from complex systems in the world to build and utilise running simulacra of those systems. The Cortical Learning Algorithm in HTM provides a self-organising structure which can automatically emulate a very large class of real-world phenomena. The design of the neocortex is specifically suited to the task of maintaining a model of the world in the face of nonstationarity in the complex system.
Nonlinear Dynamics – an Introduction
OK, that’s a lot of jargon, so I’ll illustrate this with an everyday example. Riding a real bicycle on a real road is an extraordinarily difficult task for a classical computer program. If you try to do this the 1950’s way, you’d begin by identifying a big system of partial differential equations, and then find a way to solve them numerically in order to control the robot. This turns out to be near impossible in practice, and results in a system which is very brittle and inflexible. There is another approach, however. One very popular method used in robotics and control systems today is PID (proportional/integral/differential), which involves combining mixed feedback loops between sensation and action.
Here’s a cute video of such a system:
What’s happening here is simple. The robot is using its sensors to detect how things are going, and just reacting to the changing sensory data in order to maintain stability.
The robot-controller-bicycle-floor system is an example of a nonlinear dynamical system. The real world we live in is full of such systems, but the past several centuries of physics has tended to avoid them in favour of pretending the world is linear. Much of the physics and applied math we learned in school and college approximates reality with much simpler linear systems. Only in the last century or so (and increasingly since the advent of computer simulations) have we begun to examine nonlinear dynamical systems in any detail.
The most famous recent result from Dynamical Systems Science was the discovery of Chaos, which involves the evolution of apparently unpredictable behaviour in simple, nonlinear, deterministic systems. Apart from vaguely being aware of the idea of chaos, most well-educated people have no real knowledge of how nonlinear systems work, what can be known about them, and how different systems are related. In fact, this has become perhaps the primary field of study in applied mathematics over the past 40 years, and some very clever people have made big progress in understanding these complex, non-intuitive phenomena. We’ll get back to this shortly.
Dynamical Systems and the Brain
Of course, one of the most interesting systems of this type is to be found in our brains. Often described as “the most complex thing in the known universe,” the brain is indeed a daunting thing to study. Many people have examined neural structures as dynamical systems, and proposed that nonlinear dynamics are key to working out how the brain works. Indeed, a number of researchers have demonstrated that simplified model neural networks can exhibit some of the same kinds of computational properties found in the brain (for example, see Hoerzer et al).
In fact, it appears that the brain looks like a whole bunch of interacting dynamical systems, everywhere you look, and at all scales. Surely this is only going to make things harder to understand? Well, yes and no. Yes, we’re going to have to leave the comfort of our training in seeing everything as linear, and venture into a world of oddness and unpredictability. And no, we actually can – once we take the leap – understand how nonlinear dynamics reveals the true nature of animal intelligence.
Dynamical Systems and Information
Nonlinear dynamical systems are weird. They can be entirely deterministic (rather than random), but practically unpredictable. They are often critically sensitive to initial (or measured) conditions, so in practise they might never repeat exactly the same sequences again. They may contain huge numbers of “internal variables” (billions, trillions or more), leaving us with no hope of using analytic methods in order to model them.
Yet incredibly, many dynamical systems have a miracle property. They “export” information which we can collect, and this information is often sufficient for us to build a model with the same kinds of dynamics as the original. This discovery was made in the 1970’s, the “golden decade” of dynamical systems, and it has been applied again and again in a hugely diverse range of areas.
Here’s a (very old, so murky and scratchy) video by Steve Strogatz and Kevin Cuomo:
So, what’s going on here? Well, the sending circuit is an analog dynamical system which is executing one of the most famous sets of equations in Dynamical Systems – the Lorenz Equations. The details are not important (for this discussion), but essentially the system has three “internal variables” which are coupled together with quite simple differential equations. Here’s an animation of a Lorenz system:
It’s quite beautiful. You can see how there is an elegant kind of structure to the trajectories traced out by the point, and a strange kind of symmetry in the spiralling and twisting of the butterfly-like space it lives in. In fact, this system is infinitely complex and has become the “Hello World” of dynamic systems science.
OK, so the sending system is behaving like a Lorenz System, with certain voltages in the circuit acting like the \(x\), \(y\) and \(z\) coordinates in the animation. The receiving circuit is also a Lorenz emulator, with almost exactly the same setup as the sender (they’re real electronic devices, so they can’t be identical). Now, the trick is to take just one output from the sending circuit (say \(x\)), and use it as the \(x^\prime\) voltage in the receiving circuit. As Strogatz says in his book, Sync, it’s as if the \(x^\prime\) has been “taken over” by the signal from the sender. Normally, \(x^\prime\), \(y^\prime\) and \(z^\prime\) work together to produce the elegant trajectory we see in the animation, but now \(x^\prime\) is simply ignoring its dance partners, who appear to have no choice but to synchronise themselves with the interloper from afar.
This eerie effect is much more general than you might think. It turns out that, just using a single stream of measurements, you can reconstruct the dynamics of a huge range of systems, without needing any knowledge of the “internal variables” or their equations. This result is based on Takens’ Theorem, which proves this for certain well-behaved systems (such as Lorenz’).
Here’s a video (with three parts) which explains how this works:
Part One introduces Lorenz’ system. Part Two illustrates Takens’ Theorem, and the final part shows how it can be applied to test for causal connections between time series.
The Brain as a Universal Dynamical Computer
This phenomenon is the key to what the neocortex is doing. It’s exploiting the information in time series sensory data to build replicas of the dynamics of the world, use them for identification, forecasting, modelling, communication, and behaviour. Well, that’s nice to know, but it doesn’t explain how it does that. So, let’s do that.
I referred earlier to the work of Gregor Hoerzer, which uses recurrent neural networks (RNNs) to model a few kinds of chaotic computation. RNNs are similar to other kinds of Deep Learning artificial neural networks, which use extremely simple “point neurons”. They differ in that their outputs may end up (after a few hops) as part of their own inputs. This gives RNNs a lot more power than other ANNs, which explains why they’re currently such a hot topic in Machine Learning.
I believe they are so successful right now because they use the tricks we’ve seen and self-organise to represent a simulated dynamics and thus allow for some amount of modelling, prediction and generation. RNNs are powerful, but they lack structure, and they’re very hard for us to understand. Perhaps a more structured type of network would have even more power and (fingers crossed) might be easier to understand and reason about.
Hierarchical Temporal Memory and the Cortical Learning Algorithm
In Jeff Hawkins’ HTM theory, the point neurons are replaced by far more realistic model neurons, which are much more complex and have significant computational power just on their own. Neurons are packed into columns, and the columns are arranged in layers. This structure is based on detailed study of real neocortex, and is a reasonable, first-order approximation of what you’d see in a real brain.
The key to HTM is that the layers are combined and connected just like in the brain. Each layer in a region (a small area of cortex) has different inputs and performs its own particular role in the computation. I’ve written in some depth about this before, so I’ll just briefly summarise this in the context of dynamical systems.
This rather intimidating diagram is a minimal sketch of the primary computational connections in my multilayer model. It shows the key information flows in a region of neocortex. The “primary” inputs to the region are the red and blue arrows coming in from the bottom and going to Layer 4 (and L6 as well). Here, subpopulations of cells in L4 learn to reconstruct the dynamics of the sensorimotor inputs, and forecast transitions in short timesteps. While L4 is able to predict the upcoming evolution, its representation is being pooled over time by cells in L2 and L3. These cells represent the current dynamical “regime” of the evolving dynamics in L4, which characterises the sensed system at a longer timescale than the fast-changing input to the region.
The output from L2/3 goes up the hierarchy to higher regions, which treat that as a dynamically evolving sensory input, and repeat the same process. In addition, this output goes to L5, which combines it with other inputs (from L1 and L6) and produces behaviour which has been learned to interact with the world in order to preserve or recover prediction in the entire region (see here for the mechanisms of self-stabilisation in this system).
The key thing here is that subpopulations of neurons are capable of learning to model the dynamics of the world at many timescales, and that changes of the characteristics of the real-world system cause changes in the choice of subpopulation, which is then picked up in downstream layers, leading to a new representation of the world by the region and also a motor or behavioural reaction to the new dynamics.
The other pathways in the diagram are crucial to both the learning of dynamical modelling and perception itself. The higher-level regions provide even slower-changing inputs to both L2/3 and L5, representing the more stable “state” they are working with, and assisting these cells to maintain a consistent picture of the world in the face of uncertainty and noise.
References (to be completed)
Gregor M. Hoerzer, Robert Legenstein, and Wolfgang Maass. Emergence of Complex Computational Structures From Chaotic Neural Networks Through Reward-Modulated Hebbian Learning. In Cereb. Cortex (2014) 24 (3): 677-690 first published online November 11, 2012 doi:10.1093/cercor/bhs348 Free Full Text.