Better Living through Thoughtful Technology

Monthly Archives / April 2014

• Apr 29 / 2014

Clortex: The Big Ideas

As you might know, I’ve been working away on “yet another implementation of Hierarchical Temporal Memory/Cortical Learning Algorithm” for the last few months. While that would have been nice as a hobby project, I see what I’m doing as something more than that. Working with NuPIC over the last year, I’ve gradually come to a realisation which seems kind of obvious, but remains uncomfortable:

A new kind of computing requires a new kind of software design.

HTM and CLA represent a new kind of computing, in which many, many millions of tiny, simple, unreliable components interact in a massively parallel, emergent choreography to produce what we would recognise as intelligence. This sounds like just another Neural Net theory, which is truly unfortunate, because that comparison has several consequences.

CLA – Not Another Neural Network

The first consequence relates to human comprehension. Most people who are interested in (or work in) Machine Learning and AI are familiar with the various flavours of Neural Nets which have been developed and studied over the decades. We now know that a big enough NN is the equivalent of a Turing Universal Computer, and we have a lot of what I would call “cultural knowledge” about NNs and how they work. Importantly, we have developed a set of mathematical and computational tools to work with NNs and build them inside traditional computing architectures.

This foreknowledge is perhaps the biggest obstacle facing HTM-CLA right now. It is very difficult to shake off the idea that what you’re looking at in CLA is some new twist on a familiar theme. There are neurons, with synapses connecting them; the synapses have settings which seem to resemble “weights” in NNs; the neurons combine their inputs and produce an output, and so on. The problem is not with CLA: it’s that the NN people got to use these names first and have hijacked (and overwritten) their original neuroscientific meanings.

CLA differs from NNs at every single level of granularity. And the differences are not subtle; they are fundamentally different operating concepts. Furthermore, the differences compound to create a whole idea which paradoxically continues to resemble a Neural Net but which becomes increasingly different in what it does as you scale out to layers, regions and hierarchies.

It’s best to start with a simplified idea of what a traditional NN neuron is. Essentially, it’s a pure function of its inputs. Each neuron has a local set of weights which are dot-producted with the inputs, run through some kind of “threshold filter” and produce a single scalar output. A NN is a graph of these neurons, usually organised into a layered structure with a notion of “feedforward” and “feedback” directions. Some NNs have horizontal and cyclical connections, giving rise to some “memory” or “temporal” features; such NNs are known as Recurrent Neural Networks.

In order to produce a useful NN, you must train it. This involves using a training set of input data and a learning algorithm which adjusts the weights so as to approach a situation where the collective effect of all the pure functions is the desired transformation from inputs to outputs.

In sharp contrast, CLA models each neuron as a state machines. The “output” of a neuron is not a pure function of its feedforward inputs, but incorporates the past history of its own and other neurons’ activity.  A neuron in CLA effectively combines a learned probability distribution of its feedforward inputs (as in a NN) with another learned probability distribution in the temporal domain (ie transitions between input states). Further, instead of being independent (as in NNs), the “outputs” of a set of neurons are dictated using an online competitive process. Finally, the output of a layer of CLA neurons is a Sparse Distributed Representation, which can only be interpreted as a whole, a collective, or a holographic encoding what the layer is representing.

These fundamental differences, along with the unfortunate duplicate use of names for the components, mean that your grandfather’s tools and techniques for reasoning about, working with, and mathematically modelling NNs do not apply at all to CLA.

In fact, Jeff’s work has been criticised because he’s allegedly “thrown away” the key ability to mathematically demonstrate some properties of NNs, which the consensus considers necessary if you want your theory to be admitted as valid (or your paper published). This view would have it that CLA sacrifices validity for a questionable (and in some opinions, vacuous) gain.

Jeff gave a talk a few weeks back to the London Deep Learning Meetup – it’s perhaps the best single overview of the current state of the art for CLA:

The second consequence relates to implementation in software. Numenta began work on what is now NuPIC in 2005, and most of the current theory – itself still in its early stages in 2014 – has only appeared in stages over the intervening years. Each innovation in the theory has had to be translated into and incorporated into a pre-existing system, with periodic redesigns of this or that component as understanding developed. It is unfair to expect the resulting software artefacts to magically transmute, Doctor Who-like, in a sequence of fully-formed designs, each perfectly appropriate for a new generation of the theory.

The fact that NuPIC is a highly functional, production-ready, and reasonably faithful implementation of much of CLA is a testament to the people at Numenta and their dedication to bring Jeff’s theories into an engineered reality. The details of how NuPIC works, and how it can be used, are a consequence of its history, co-evolving with the theory, and the software design and development techniques which were available over that history.

Everything involves tradeoffs, and NuPIC is no exception. I have huge respect for the decisions which have led to the NuPIC of 2014, and I would like to view Clortex as nothing other than NuPIC, metamorphosed for a new phase of Machine Intelligence based on HTM and CLA, with a different set of tradeoffs and the chance to stretch the boundaries yet again.

So, rather than harp on what might be limiting or difficult with NuPIC, I’ll now describe some of the key improvements which are possible when a “new kind of software” is created for HTM and CLA.

Architectural Simplicity and Antifragile Software – Russ Miles

It’s odd, but Clortex’ journey began when I followed a link to a talk Jeff gave last year [free registration required] at GOTO Aarhus 2013, and decided to watch one, then two, and finally all three talks given by Russ Miles at the same event. If you’re only able to watch one, the one to watch is Architectural Simplicity through Events. In that talk, Russ outlines his main Axioms for building adaptable software:

1. Your Software’s First Role is to be Useful

Clearly, NuPIC is already useful, but there is a huge opportunity for Clortex to be useful in several new ways:

a) As a Teaching Tool to help understand the CLA and its power. HTM and CLA are difficult to understand at a deep level, and they’re very different from traditional Neural Networks in every way. A new design is needed to transparently communicate an intuitive view of CLA to layman, machine learning expert, and neuroscientist alike. The resulting understanding should be as clear to an intelligent and interested viewer as it is to Jeff himself.

b) As a Research and Development platform for Machine Intelligence. Jeff has recently added – literally – a whole set of layers to his theory, involving a new kind of temporal pooling, sensorimotor modelling, multilayer regions, behaviour, subcortical connections, and hierarchy. This is all being done with thought experiments, whiteboards, pen and paper, and slides. We’ll see this in software sometime, no doubt, but that process has only begun. A new system which allows many of these ideas to be directly expressed in software and tested in real time will accelerate the development of the theory and allow many more people to work on it.

c) As a Production Platform for new Use Cases. NuPIC is somewhat optimised for a certain class of use cases – producing predictions and detecting anomalies in streaming machine-generated numerical data. It’s also been able to demonstrate capabilities in other areas, but there is a huge opportunity for a new design to allow entirely new types of information to be handled by HTM and CLA techniques. These include vision, natural language, robotics and many other areas to which traditional AI and ML techniques have been applied with mixed results. A new design, which emphasises adaptability, flexibility, scaleability and composability, will allow CLA to be deployed at whatever scale (in terms of hierarchy, region size, input space etc as well as machine resources) is appropriate to the task.

2. The best software is that which is not needed at all

Well, we have our brains, and the whole point of this is to build software which uses the principles of the brain. On the other hand, we can minimise over-production by only building the components we need, once we understand how they work and how they contribute to the overall design. Clortex embraces this using a design centred around immutable data structures, surrounded by a growing set of transforming functions which work on that data.

3. Human Comprehension is King

This axiom is really important for every software project, but so much more so when the thing you’re modelling is so difficult to understand for many. The key with applying this axiom is to recognise that the machine is only the second most important audience for your code – the most important being other humans who will interact with your code as developers, researchers and users. Clortex has as its #1 requirement the need to directly map the domain – Jeff’s theory of the neocortex – and to maintain that mapping at all costs. This alone would justify building Clortex for me.

4. Machine Sympathy is Queen

This would seem to contradict Axiom 3, but the use of the word “Queen” is key. Any usable system must also address the machine environment in which it must run, and machine sympathy is how you do that. Clortex’ design is all about turning constraints into synergies, using the expressive power and hygiene of Clojure and its immutable data structures, the unique characteristics of the Datomic database system, and the scaleability and portability characteristics of the Java Virtual Machine. Clortex will run on Raspberry Pi, a version of will run in browsers and phones, yet it will scale layers and hierarchies across huge clusters to deliver real power and test the limits of HTM and CLA in production use.

5. Software is a Process of R&D

This is obviously the case when you’re building software based on an evolving theory of how the brain does it. Russ’ key point here is that our work always involves unknowns, and our software and processes must be designed in such a way as not to slow us down in our R&D work. Clortex is designed as a set of loosely coupled, interchangeable components around a group of core data structures, and communicating using simple, immutable data.

6. Software Development is an Extremely Challenging Intellectual Pursuit

Again, this is so true in this case, but the huge payoff you can derive if you can come up with a design which matches the potential of the CLA is hard to beat. I hope that Clortex can meet this most extreme of challenges.

Stay tuned for Pt II – The Clortex System Design..

• Apr 11 / 2014
Clojure

Doc-driven Development Using lein-midje-doc

This is one of a series of posts on my experiences developing Clortex in Clojure, a new dialect of LISP which runs on the Java Virtual Machine. Clortex is a re-implementation of Numenta’s NuPIC, based on Jeff Hawkins’ theories of computational neuroscience. You can read my in-progress book by clicking on the links to the right. Clortex will become public this month.

One of the great things about Clojure is the vigour and quality of work being done to create the very best tools and libraries for developing in Clojure and harnessing its expressive power. Chris Zheng‘s lein-midje-doc is an excellent example. As its name suggests, it’s uses the comprehensive Midje testing library, but in a literate programming style which produces documentation or tutorials.

Doc-driven Development

Before we get to DDD, let’s review its antecedent, TDD.

Test-driven Development

Test-driven Development (TDD) has become practically a tradition, arising from the Agile development movement. In TDD, you develop your code based on creating small tests first (these specify what your code will have to do); the tests fail at first because you haven’t yet written code to make them pass. You then write some code which makes a test pass, and proceed until all tests pass. Keep writing new tests, failing and coding to pass, until the tests encompass the full spec for your new feature or functionality.

For example, to write some code which finds the sum of two numbers, you might first do the following:
 

(fact "adding two numbers returns their sum" ; "fact" is a Midje term for a property under test
(my-sum 7 5) => 12 ; "=>" says the form on the left should return or match the form on the right
)

 

This will fail, firstly because there is no function my-sum. To fix this, write the following:
 

(defn my-sum [a b]
12)
)

 

Note that this is the correct way to pass the test (it’s minimal). Midje will go green (all tests pass). Now we need to force the code to generalise:
 

(fact "adding two numbers returns their sum"
(my-sum 7 5) => 12
(my-sum 14 10) => 24
)

 

Which makes us actually write the code in place of 12:
 

(defn my-sum [a b]
(+ a b)
)

 

The great advantage of TDD is that you don’t ever write code unless it is to pass a test you’ve created. As Rich Hickey says: “More code means… more bugs!” so you should strive to write the minimum code which solves your problem, as specified in your tests. The disadvantage of TDD is that it shifts the work into designing a series of tests which (you hope) defines your problem well. This is better than designing by coding, but another level seems to be required. Enter literate programming.

Literate Programming

This style of development was invented by the famous Donald Knuth back in the 70’s. Knuth’s argument is that software is a form of communication. The first is obvious: a human (the developer) is communicating instructions to the machine in the form of source code. The second is less obvious but perhaps more important: you are communicating your requirements, intentions and design decisions to other humans (including your future self). Knuth designed and built a system for literate programming, and this forms the basis for all similar systems today.

This post is an example of literate programming (although how ‘literate’ it is is left to the reader to decide), in that I am forming a narrative to explain a concrete software idea, using text, and interspersing it with code examples which could be executed by a system processing the document.

Doc-driven Development

DDD is essentially a combination of TDD and Literate Programming. Essentially, you write a document about some part of your software, a narrative describing what it should do and examples of how it should work. The examples are framed as “given this, the following should be expected to happen”, which you write as facts (a type of test which is easier for humans to read). The DDD system runs the examples and checks the return values to see if they match the expectations in your document.

This big advantage of this is that your documentation is a much higher-level product than a list of unit tests, in that your text provides the reader (including your future self returning to the code) with much more than a close inspection of test code would yield. In addition, your sample code and docs are guaranteed to stay in synch with your code, because they actually run your code every time it changes.

lein-midje-doc

lein-midje-doc was developed by Chris Zheng as a plugin for the Clojure Leiningen project and build tool. It leverages Midje to convert documents written in the Literate Programming style into suites of tests which can verify the code described.

It’s simple to set up. You have to add dependencies to your project.clj file, and then you add an entry for each document you wish to include (instructions are in the README.md, full docs on Chris’ site). then you use two shells to run things. In one, you run lein midje-doc, which repeatedly creates the readable documents as HTML files from your source files, and in the other you run midje.repl‘s (autotest) function to actually spin through your tests.

Here’s Chris demonstrating lein-midje-doc: