Yann LeCun recently posed some questions on Facebook about the brain. I’d like to address these really great questions in the context of Hierarchical Temporal Memory (HTM). I’ll intersperse the questions and answers in order.
A list of challenges related to how neuroscience can help computer science:
– The brains appears to be a kind of prediction engine. How do we translate the principle of prediction into a practical learning paradigm?
HTM is based on seeing the brain as a prediction system. The Cortical Learning Algorithm uses intra-layer connections to distal dendrites to learn transitions between feedforward sensory inputs. Individual neurons use inputs from neighbouring, recently active neurons to learn to predict their own activity in context. The layer as a whole chooses as sparse a set of best predictor-recognisers to represent the current situation.
– Good ML paradigms are built around the minimization of an objective function. Does the brain minimize an objective function? What is this function?
The answer is different at each level of the system, but the common theme is efficiency of activity. Synapses/dendritic spines form, grow and shrink in response to incoming signals, in order to maximise the correlation between an incoming signal and the neuron’s activity. Neurons adjust their internal thresholds and other parameters in order to maximise their probability of firing given a combined feedforward/context input pattern. Columns (represented using a simplified sheath of inhibitory neurons) again adjust their synapses in order to maximise their contained cells’ probability of becoming active given the inputs. The objective metric of a layer of neurons is the sparsity of representation, with errors in prediction-recognition being measured as lower sparsity (bursting in columns). A region of cortex produces motor output which minimises deviations from stable predicted representations of the combined sensory, motor, contextual and top-down inputs.
– Good ML systems estimate the gradient of their objective function in order to minimize it. Assuming the brain minimizes an objective function, does it estimate its gradient? How does it do it?
Each component in HTM uses only local information to adapt and learn. The optimisation emerges from each components’ responses as it learns, and from competition between columns and neurons to represent the inputs.
– Assuming that the brain computes some sort of gradient, how does it use it to optimize the objective?
There is no evidence of a mechanism in the brain which operates in this way. HTM does without such a mechanism.
– What are the principles behind unsupervised learning? Much of learning in the brain is unsupervised (or predictive). We have lots of unsupervised/predictive learning paradigms, but none of them seems as efficient as what the brain uses. How do we find one that is as efficient and general as biological learning?
CLA is a highly efficient and completely general unsupervised learning mechanism, which automatically learns the combined spatial and temporal structure of the inputs.
– Short term memory: the cortex seems to have a very short term memory with a span of about 20 seconds. Remembering things for more than 20 seconds seems to require the hippocampus. And learning new skills seems to take place in the cortex with help from the hippocampus. How do we build learning machines with short-term memory? There have been proposals to augment recurrent neural nets with a separate associative short-term memory module (e.g LSTM, Facebook’s “Memory Networks”, Deep Mind’s “Neural Turing Machine”). This is a model by which the “processor” (e.g. a recurrent net) is separate from the “RAM” (e.g. a hippocampus-like assoicative memory). Could we get inspiration from neuroscience about how to do this?
Hierarchy in HTM provides short-term memory, with higher-level regions seeking to form a stable representation of the current situation in terms of sequence-sets of lower-level representations of the state of the world. Each region uses prediction-assisted recognition to represent its input, predict future inputs, and execute behaviours which maintain the predicted future.
– Resource allocation in short-term memory: if we have a separate module for short-term memory, how are resources allocated within it? When we enter a room, our position in the room, the geometry of the room, and the landmarks and obstacles in it are stored in our hippocampus. Presumably, the neural circuits used for this are recycled and reused for future tasks. How?
There’s no evidence of a separate short-term memory module in the brain. The entire neocortex is the memory, with the ephemeral activity in each region representing the current content. Active hierarchical communication between regions lead to the evolution of perception, decisions and behaviour. At the “top” of the hierarchy, the hippocampus is used to store and recycle longer-term memories.
– How does the brain perform planning, language production, motor control sequences, and long chains of reasoning? Planning complex tasks (which includes communicating with people, writing programs, and solving math problems) seems like an important part of AI system.
Because of the multiple feedforward and feedback pathways in neocortex, the entire system is constantly acting as a cyclic graph of information flow. In each region, memories of sequences are used in recognition, prediction, visualisation, execution of behaviour, imagination and so on. Depending on the task, the representations can be sensory, sensorimotor, pseudosensory (diagrammatic) or linguistic.
– resource allocation in the cortex: how does the brain “recruit” pieces of cortex when it learns a new task. In monkeys that have lost a finger, the corresponding sensory area gets recruited by other fingers when the monkey is trained to perform a task that involves touch.
There is always a horizontal “leakage” level of connections in any area of neocortex. When an area is deprived of input, neurons at the boundary respond to activity in nearby regions by increasing their response to that activity. This is enhanced by the “housekeeping” glial cells embedded in cortex, which actively bring axons and dendrites together to knit new connections.
– The brain uses spikes. Do spikes play a fundamental role in AI and learning, or are they just made necessary by biological hardware?
Spikes are very important in the real brain, but they are not directly needed for the core processing of information, so HTM doesn’t model them per se. We do use an analogue to Spike Timing Dependent Plasticity in the core Hebbian learning of predictive connections, but this is simplified to a timestep-based model rather than individual spikes.
We have elements of answers and avenues for research for many of these points, but no definite/perfect solutions.
HTM’s solutions are also neither perfect nor definitive, but they are our best attempt to address your questions in a simple, coherent and effective system, which directly depends on data from neuroscience.
Thanks to Yann for asking such pertinent questions about how the brain might work. It’s a recognition that the brain has a lot to teach us about intelligence and learning.