Blog

  • Jun 07 / 2014
  • 0
IMG_1256-web
Real Life!

Ham Boiled and Roasted in Guinness, with Belfast Champ

This has to be one of my favourite dishes. It’s absolutely Irish, but has some twists which come from learning how to cook with Italians, Spaniards and Mexicans. It’s a bit of work (some in prep, mostly in the care you take checking up on your food as it cooks), but it results in a sublime thing which your dinner guests will never forget you introduced them to. Just remember to say you got the recipe from me (I got the inspiration for this from the Guinness site, which has a very decent recipe for the boiled version of this one).

IMG_1256-web

Serves 6

For boiling:

  • 1.5Kg (3lbs) Prime Ham Fillet (get it from your butcher, ask for it on the bone if possible), skin on.
  • Some pork or bacon ribs or other pork bones (butcher will often give you these for free from their bin)
  • Half-dozen cloves
  • 3 Spanish (or any large) Onions, finely chopped
  • 1 Red Onion (optional)
  • 1 Red Bell Pepper, finely chopped
  • 3 medium sized carrots
  • 2 Spring Onions (Scallions)
  • 1 or 2 Bay (or Laurel) leaves
  • A sprig of fresh (or a 1tsp dried) Parsley
  • 1 can of Draught Guinness
  • 3 peeled tomatoed, chopped, or 1 can of chopped tomatoes
  • Grapeseed, Sunflower or Olive Oil
  • Salt and Pepper
  • A pinch of Spanish Pimenton (Paprika) optional

Equipment: 1 Large Saucepan or Pressure Cooker, 1 Sharp Chef’s Knife, Veg cutting board or food processor.

For Roasting:

  • 2-3 tsps of the best honey you can find (or Demerera or brown sugar)
  • More carrots, parsnips, root veg to your taste, roughly chopped.
  • (Optional, I don’t) another can of Guinness

Preheat Oven to 180C (400F).
Equipment: 1 Deep Baking Dish

To Serve with Belfast Champ:

  • 12 Medium “Floury” Potatoes ( a small bag)
  • A bunch of Spring Onions (Scallions)
  • 1/2 a bunch of Thyme (or 3 tsp dried)
  • 50g (2oz) Irish Butter
  • 125ml (1/4 pint) fresh whole milk
  • 1 organic free range egg

Equipment: Potato Ricer (makes the spuds delicious) or masher.

(Optional, not the Irish way!) a leaf salad made with lemon, olive oil and Rocket leaves.

Method:

In a large saucepan, sweat the onions, scallions, carrot and pepper for 5-10 mins over a gentle heat until the onion is translucent. Add a few pinches of salt and pepper. Remove from heat and place in a bowl.

Wash all the meat under cold running water before use. Score the ham skin and stick the cloves through the slits into the meat. Put a little oil in the saucepan and place the ham in the centre. Slip the pork bones between the ham and the sides of the saucepan. Add the can of Guinness, pouring over the meat. Add back the onion-carrot mix, and the tomatoes, parsley and pimenton. Slide a couple of bay leaves down the sides of the pan and cover. When it boils, lower the heat and gently simmer for as long as possible, but at least 90 minutes. Every 10-15 mins, return to the pan to ensure it’s gently simmering and taste the evolving flavours.

30 minutes before the ham is done, put on the oven at 190C/375F. Place the baking dish in the oven to heat up as well.

You’ve now cooked the ham perfectly well, so if you wish to give up now, work away and you could serve it as boiled ham and rule (prepare the Belfast Champ as below). But trust me, you really want to roast it very slowly to bring out the great flavours we’ve just infused into the meat. So, take the baking dish out of the oven and smear some oil over it before placing your ham in the centre. Pour the sauce from the saucepan over the meat and put the pork bones/ribs around the ham in the dish. Add any more carrots, parsnips, or other root veg as you like, scattered around the ham. At this point some people would pour another pint of Guinness over the meat, but I’d prefer to drink it! Now, take a couple of teaspoons of the best honey you can find (or brown sugar if you can’t) and smear the top of the ham with it. Sprinkle with freshly picked thyme and place in the oven at the high heat to crisp the fat for 15 minutes. Reduce the oven temperature to 150C/300F and allow to slow-roast for at least another 45 mins. Every 15-30 mins, take it out of the oven and baste the juices over the ham.

15 minutes before the ham is ready, have your potatoes peeled and chunk them into 2cm or 3/4in cubes. Boil gently in a large saucepan for 12-15 minutes (you can tell they’re done when you can stick a fork straight into them without resistance). Drain the spuds in a colander and place in a bowl. In the potato saucepan, heat up the milk and butter, add the spring onions and thyme, and gently bring to the boil. Reduce the heat, simmer for a couple of minutes to poach the onions, and add the butter and the raw egg. Whisk until smooth, remove from heat and add the potato using a potato ricer (or else gently mash the potatoes in the bowl using a potato masher or fork). Fold everything together until it has a glossy appearance. If you have too much potato (if it is too dry) add a little more milk at a time until it’s smooth. Keep warm but off the hob (it’ll burn).

When the ham is done. remove from the oven and allow to rest for 10-15 minutes (this distributes the juices in the ham throughout the meat and is essential for a good result). By all means baste the ham one last time using the juices from the dish. If you like thick gravy, you should remove the veg and bones from the dish and place it over a gentle heat on the hob and scrape it to release the burned-in flavours while condensing it. I personally prefer a lighter gravy so I don’t reduce the sauce.

Once rested, carve the ham and plate up a few slices of ham, lots of the champ, the roast veg, and salad if that’s your thing. Pour liberal quantities of gravy over both meat and spuds. Enjoy!

 

  • May 09 / 2014
  • 0
Cortical Learning Algorithm, General Interest

Proposed Mechanism for Layer 4 Sensorimotor Prediction

Jeff Hawkins has recently talked about a sensorimotor extension for his Cortical Learning Algorithm (CLA). This extension involves Layer 4 cells learning to predict near-future sensorimotor inputs based on the current sensory input and a copy of a related motor instruction. This article briefly describes an idea which can explain both the mechanism, and several useful properties, of this phenomenon. It is both a philosophical and a neuroscientific idea, which serves to explain our experience of cognition, and simultaneously explains an aspect of the functioning of the cortex.

In essence, Jeff’s new idea is based on the observation that Layer 4 cells in a region receive information about a part of the current sensory (afferent, feedforward) inputs to the region, along with a copy of related motor command activity. The idea is that Layer 4 combines these to form a prediction of the next set of sensory inputs, having previously learned the temporal coincidence of the sensory transition and the effect of executing the motor command.

One easily visualised example is that a face recognising region, currently perceiving a right eye, can learn to predict seeing a left eye when a saccade to the right is the motor command, and/or a nose when a saccade to the lower right is made, etc. Jeff proposes that this is used to form a stable representation of the face in Layer 3, which is receiving the output of these Layer 4 cells.

The current article claims that the “motor command” represents either a real motor command to be executed, which will cause the predicted change in sensory input, or else the analogous “change in the world” which would have the same transitional sensory effect. The latter would represent, in the above example, the person whose face is seen, moving her own head in the opposite direction, and presenting an eye or nose to the observer while the observer is passive.

In the case of speech recognition, the listener uses her memory of how to make the next sound to predict which sounds the speaker is likely to make next. At the same time, the speaker is using his memory of the sound he expects to make to perform fine control over his motor behaviour.

Another example is the experience of sitting on a stationary train when another train begins to move out of the station. The stationary observer often gets the feeling that she is in fact moving and that the other train is not (and a person in the other train may have the opposite perception – that he is stationary and the first person’s train is the one which is moving).

The colloquial term for this idea is the notion of a “mirror cell”. This article claims that so-called “mirror cells” are pervasive at all levels of cortex and serve to explain exactly why every region of cortex produces “motor commands” in the processing of what is usually considered pure sensory information.

In this way, the cortex is creating a truly integrated sensorimotor model, which not only contains and explains the temporal structure of the world, but also stores and provides the “means of construction” of that temporal structure in terms of how it can be generated (either by the action of the observer interacting with the world, or by the passive observation of the external action of some cause in the world).

This idea also provides an explanation for the learning power of the cortex. In learning to perceive the world, we need to provide – literally – a “motivation” for every observed event in the world, as either the result of our action or by the occurrence of a precisely mirrored action caused externally. At a higher cognitive level, this explains why the best way to learn anything is to “do it yourself” – whether it’s learning a language or proving a theorem. Only when we have constructed both an active and a passive sensorimotor model of something do we possess true understanding of it.

Finally, this idea explains why some notions are hard to “get” at times – this model requires a listener or learner not just to imagine the sensory perception or cognitive “snapshot” of an idea, but the events or actions which are involved in its construction or establishment in the world.

  • May 01 / 2014
  • 0
CLA Layer
Clortex (HTM in Clojure)

Clortex Pre-Alpha Now Public

This is one of a series of posts on my experiences developing Clortex in Clojure, a new dialect of LISP which runs on the Java Virtual Machine. Clortex is a re-implementation of Numenta’s NuPIC, based on Jeff Hawkins’ theories of computational neuroscience. You can read my in-progress book by clicking on the links to the right.

Until today, I’ve been developing Clortex using a private repo on Github. While far from complete, I feel that Clortex is now at the stage where people can take a look at it, give feedback on the design, and help shape the completion of the first alpha release over the coming weeks.

I’ll be hacking on Clortex this weekend (May 3rd-4th) at the NuPIC Spring Hackathon in San José, please join us on the live feeds and stay in touch using the various Social Media tools.

WARNING: Clortex is not even at the alpha stage yet. I’ll post instructions over the next few days which will allow you to get some visualisations running.

You can find Clortex on Github at https://github.com/fergalbyrne/clortex

A new kind of computing requires a new kind of software design.
Hierarchical Temporal Memory (HTM) and the Cortical Learning Algorithm (CLA) represent a new kind of computing, in which many, many millions of tiny, simple, unreliable components interact in a massively parallel, emergent choreography to produce what we would recognise as intelligence.
Jeff Hawkins and his company, Numenta, have built a system called NuPIC using the principles of the neocortex. Clortex is a reimagining of CLA, using modern software design ideas to unleash the potential of the theory.
Clortex’ design is all about turning constraints into synergies, using the expressive power and hygiene of Clojure and its immutable data structures, the unique characteristics of the Datomic database system, and the scaleability and portability characteristics of the Java Virtual Machine. Clortex will run on hosts as small as Raspberry Pi, a version will soon run in browsers and phones, yet it will scale layers and hierarchies across huge clusters to deliver real power and test the limits of HTM and CLA in production use.
How can you get involved?
Clortex is just part of a growing effort to realise the potential of Machine Intelligence based on the principles of the brain.
  • Visit the Numenta.org site for videos, white papers, details of the NuPIC mailing list, wikis, etc.
  • Have a look at (and optionally pre-purchase) my Leanpub.com book: Real Machine Intelligence with Clortex and NuPIC.
  • Join the Clortex Google Group for discussion and updates.
  • We’ll be launching an Indiegogo campaign during May 2014 to fund completion of Clortex, please let us know if you’re interested in supporting us when we launch.
  • Apr 29 / 2014
  • 0
clortex-logo
Clortex (HTM in Clojure), General Interest

Clortex: The Big Ideas

As you might know, I’ve been working away on “yet another implementation of Hierarchical Temporal Memory/Cortical Learning Algorithm” for the last few months. While that would have been nice as a hobby project, I see what I’m doing as something more than that. Working with NuPIC over the last year, I’ve gradually come to a realisation which seems kind of obvious, but remains uncomfortable:

A new kind of computing requires a new kind of software design.

HTM and CLA represent a new kind of computing, in which many, many millions of tiny, simple, unreliable components interact in a massively parallel, emergent choreography to produce what we would recognise as intelligence. This sounds like just another Neural Net theory, which is truly unfortunate, because that comparison has several consequences.

CLA – Not Another Neural Network

The first consequence relates to human comprehension. Most people who are interested in (or work in) Machine Learning and AI are familiar with the various flavours of Neural Nets which have been developed and studied over the decades. We now know that a big enough NN is the equivalent of a Turing Universal Computer, and we have a lot of what I would call “cultural knowledge” about NNs and how they work. Importantly, we have developed a set of mathematical and computational tools to work with NNs and build them inside traditional computing architectures.

This foreknowledge is perhaps the biggest obstacle facing HTM-CLA right now. It is very difficult to shake off the idea that what you’re looking at in CLA is some new twist on a familiar theme. There are neurons, with synapses connecting them; the synapses have settings which seem to resemble “weights” in NNs; the neurons combine their inputs and produce an output, and so on. The problem is not with CLA: it’s that the NN people got to use these names first and have hijacked (and overwritten) their original neuroscientific meanings.

CLA differs from NNs at every single level of granularity. And the differences are not subtle; they are fundamentally different operating concepts. Furthermore, the differences compound to create a whole idea which paradoxically continues to resemble a Neural Net but which becomes increasingly different in what it does as you scale out to layers, regions and hierarchies.

It’s best to start with a simplified idea of what a traditional NN neuron is. Essentially, it’s a pure function of its inputs. Each neuron has a local set of weights which are dot-producted with the inputs, run through some kind of “threshold filter” and produce a single scalar output. A NN is a graph of these neurons, usually organised into a layered structure with a notion of “feedforward” and “feedback” directions. Some NNs have horizontal and cyclical connections, giving rise to some “memory” or “temporal” features; such NNs are known as Recurrent Neural Networks.

In order to produce a useful NN, you must train it. This involves using a training set of input data and a learning algorithm which adjusts the weights so as to approach a situation where the collective effect of all the pure functions is the desired transformation from inputs to outputs.

In sharp contrast, CLA models each neuron as a state machines. The “output” of a neuron is not a pure function of its feedforward inputs, but incorporates the past history of its own and other neurons’ activity.  A neuron in CLA effectively combines a learned probability distribution of its feedforward inputs (as in a NN) with another learned probability distribution in the temporal domain (ie transitions between input states). Further, instead of being independent (as in NNs), the “outputs” of a set of neurons are dictated using an online competitive process. Finally, the output of a layer of CLA neurons is a Sparse Distributed Representation, which can only be interpreted as a whole, a collective, or a holographic encoding what the layer is representing.

These fundamental differences, along with the unfortunate duplicate use of names for the components, mean that your grandfather’s tools and techniques for reasoning about, working with, and mathematically modelling NNs do not apply at all to CLA.

In fact, Jeff’s work has been criticised because he’s allegedly “thrown away” the key ability to mathematically demonstrate some properties of NNs, which the consensus considers necessary if you want your theory to be admitted as valid (or your paper published). This view would have it that CLA sacrifices validity for a questionable (and in some opinions, vacuous) gain.

Jeff gave a talk a few weeks back to the London Deep Learning Meetup – it’s perhaps the best single overview of the current state of the art for CLA:

The second consequence relates to implementation in software. Numenta began work on what is now NuPIC in 2005, and most of the current theory – itself still in its early stages in 2014 – has only appeared in stages over the intervening years. Each innovation in the theory has had to be translated into and incorporated into a pre-existing system, with periodic redesigns of this or that component as understanding developed. It is unfair to expect the resulting software artefacts to magically transmute, Doctor Who-like, in a sequence of fully-formed designs, each perfectly appropriate for a new generation of the theory.

The fact that NuPIC is a highly functional, production-ready, and reasonably faithful implementation of much of CLA is a testament to the people at Numenta and their dedication to bring Jeff’s theories into an engineered reality. The details of how NuPIC works, and how it can be used, are a consequence of its history, co-evolving with the theory, and the software design and development techniques which were available over that history.

Everything involves tradeoffs, and NuPIC is no exception. I have huge respect for the decisions which have led to the NuPIC of 2014, and I would like to view Clortex as nothing other than NuPIC, metamorphosed for a new phase of Machine Intelligence based on HTM and CLA, with a different set of tradeoffs and the chance to stretch the boundaries yet again.

So, rather than harp on what might be limiting or difficult with NuPIC, I’ll now describe some of the key improvements which are possible when a “new kind of software” is created for HTM and CLA.

Architectural Simplicity and Antifragile Software – Russ Miles

It’s odd, but Clortex’ journey began when I followed a link to a talk Jeff gave last year [free registration required] at GOTO Aarhus 2013, and decided to watch one, then two, and finally all three talks given by Russ Miles at the same event. If you’re only able to watch one, the one to watch is Architectural Simplicity through Events. In that talk, Russ outlines his main Axioms for building adaptable software:

1. Your Software’s First Role is to be Useful

Clearly, NuPIC is already useful, but there is a huge opportunity for Clortex to be useful in several new ways:

a) As a Teaching Tool to help understand the CLA and its power. HTM and CLA are difficult to understand at a deep level, and they’re very different from traditional Neural Networks in every way. A new design is needed to transparently communicate an intuitive view of CLA to layman, machine learning expert, and neuroscientist alike. The resulting understanding should be as clear to an intelligent and interested viewer as it is to Jeff himself.

b) As a Research and Development platform for Machine Intelligence. Jeff has recently added – literally – a whole set of layers to his theory, involving a new kind of temporal pooling, sensorimotor modelling, multilayer regions, behaviour, subcortical connections, and hierarchy. This is all being done with thought experiments, whiteboards, pen and paper, and slides. We’ll see this in software sometime, no doubt, but that process has only begun. A new system which allows many of these ideas to be directly expressed in software and tested in real time will accelerate the development of the theory and allow many more people to work on it.

(Here’s Jeff talking about this in detail recently:)

c) As a Production Platform for new Use Cases. NuPIC is somewhat optimised for a certain class of use cases – producing predictions and detecting anomalies in streaming machine-generated numerical data. It’s also been able to demonstrate capabilities in other areas, but there is a huge opportunity for a new design to allow entirely new types of information to be handled by HTM and CLA techniques. These include vision, natural language, robotics and many other areas to which traditional AI and ML techniques have been applied with mixed results. A new design, which emphasises adaptability, flexibility, scaleability and composability, will allow CLA to be deployed at whatever scale (in terms of hierarchy, region size, input space etc as well as machine resources) is appropriate to the task.

2. The best software is that which is not needed at all

Well, we have our brains, and the whole point of this is to build software which uses the principles of the brain. On the other hand, we can minimise over-production by only building the components we need, once we understand how they work and how they contribute to the overall design. Clortex embraces this using a design centred around immutable data structures, surrounded by a growing set of transforming functions which work on that data.

3. Human Comprehension is King

This axiom is really important for every software project, but so much more so when the thing you’re modelling is so difficult to understand for many. The key with applying this axiom is to recognise that the machine is only the second most important audience for your code – the most important being other humans who will interact with your code as developers, researchers and users. Clortex has as its #1 requirement the need to directly map the domain – Jeff’s theory of the neocortex – and to maintain that mapping at all costs. This alone would justify building Clortex for me.

4. Machine Sympathy is Queen

This would seem to contradict Axiom 3, but the use of the word “Queen” is key. Any usable system must also address the machine environment in which it must run, and machine sympathy is how you do that. Clortex’ design is all about turning constraints into synergies, using the expressive power and hygiene of Clojure and its immutable data structures, the unique characteristics of the Datomic database system, and the scaleability and portability characteristics of the Java Virtual Machine. Clortex will run on Raspberry Pi, a version of will run in browsers and phones, yet it will scale layers and hierarchies across huge clusters to deliver real power and test the limits of HTM and CLA in production use.

5. Software is a Process of R&D

This is obviously the case when you’re building software based on an evolving theory of how the brain does it. Russ’ key point here is that our work always involves unknowns, and our software and processes must be designed in such a way as not to slow us down in our R&D work. Clortex is designed as a set of loosely coupled, interchangeable components around a group of core data structures, and communicating using simple, immutable data.

6. Software Development is an Extremely Challenging Intellectual Pursuit

Again, this is so true in this case, but the huge payoff you can derive if you can come up with a design which matches the potential of the CLA is hard to beat. I hope that Clortex can meet this most extreme of challenges.

Stay tuned for Pt II – The Clortex System Design..

 

  • Apr 11 / 2014
  • 0
Clojure

Doc-driven Development Using lein-midje-doc

This is one of a series of posts on my experiences developing Clortex in Clojure, a new dialect of LISP which runs on the Java Virtual Machine. Clortex is a re-implementation of Numenta’s NuPIC, based on Jeff Hawkins’ theories of computational neuroscience. You can read my in-progress book by clicking on the links to the right. Clortex will become public this month.

One of the great things about Clojure is the vigour and quality of work being done to create the very best tools and libraries for developing in Clojure and harnessing its expressive power. Chris Zheng‘s lein-midje-doc is an excellent example. As its name suggests, it’s uses the comprehensive Midje testing library, but in a literate programming style which produces documentation or tutorials.

Doc-driven Development

Before we get to DDD, let’s review its antecedent, TDD.

Test-driven Development

Test-driven Development (TDD) has become practically a tradition, arising from the Agile development movement. In TDD, you develop your code based on creating small tests first (these specify what your code will have to do); the tests fail at first because you haven’t yet written code to make them pass. You then write some code which makes a test pass, and proceed until all tests pass. Keep writing new tests, failing and coding to pass, until the tests encompass the full spec for your new feature or functionality.

For example, to write some code which finds the sum of two numbers, you might first do the following:

(fact "adding two numbers returns their sum" ; "fact" is a Midje term for a property under test
    (my-sum 7 5) => 12 ; "=>" says the form on the left should return or match the form on the right
)


This will fail, firstly because there is no function my-sum. To fix this, write the following:

(defn my-sum [a b]
  12)
)


Note that this is the correct way to pass the test (it’s minimal). Midje will go green (all tests pass). Now we need to force the code to generalise:

(fact "adding two numbers returns their sum" 
    (my-sum 7 5) => 12 
    (my-sum 14 10) => 24
)


Which makes us actually write the code in place of 12:

(defn my-sum [a b]
  (+ a b)
)

The great advantage of TDD is that you don’t ever write code unless it is to pass a test you’ve created. As Rich Hickey says: “More code means… more bugs!” so you should strive to write the minimum code which solves your problem, as specified in your tests. The disadvantage of TDD is that it shifts the work into designing a series of tests which (you hope) defines your problem well. This is better than designing by coding, but another level seems to be required. Enter literate programming.

Literate Programming

This style of development was invented by the famous Donald Knuth back in the 70′s. Knuth’s argument is that software is a form of communication. The first is obvious: a human (the developer) is communicating instructions to the machine in the form of source code. The second is less obvious but perhaps more important: you are communicating your requirements, intentions and design decisions to other humans (including your future self). Knuth designed and built a system for literate programming, and this forms the basis for all similar systems today.

This post is an example of literate programming (although how ‘literate’ it is is left to the reader to decide), in that I am forming a narrative to explain a concrete software idea, using text, and interspersing it with code examples which could be executed by a system processing the document.

Doc-driven Development

DDD is essentially a combination of TDD and Literate Programming. Essentially, you write a document about some part of your software, a narrative describing what it should do and examples of how it should work. The examples are framed as “given this, the following should be expected to happen”, which you write as `facts` (a type of test which is easier for humans to read). The DDD system runs the examples and checks the return values to see if they match the expectations in your document.

This big advantage of this is that your documentation is a much higher-level product than a list of unit tests, in that your text provides the reader (including your future self returning to the code) with much more than a close inspection of test code would yield. In addition, your sample code and docs are guaranteed to stay in synch with your code, because they actually run your code every time it changes.

lein-midje-doc

lein-midje-doc was developed by Chris Zheng as a plugin for the Clojure Leiningen project and build tool. It leverages Midje to convert documents written in the Literate Programming style into suites of tests which can verify the code described.

It’s simple to set up. You have to add dependencies to your project.clj file, and then you add an entry for each document you wish to include (instructions are in the README.md, full docs on Chris’ site). then you use two shells to run things. In one, you run lein midje-doc, which repeatedly creates the readable documents as HTML files from your source files, and in the other you run midje.repl‘s (autotest) function to actually spin through your tests.

Here’s Chris demonstrating lein-midje-doc:

  • Mar 31 / 2014
  • 0
Clortex-NuPIC-Book-Cover
Clortex (HTM in Clojure), General Interest, NuPIC

Real Machine Intelligence now Available on Leanpub.com

Clortex-NuPIC-Book-Cover
The first three chapters of my new book, Real Machine Intelligence with Clortex and NuPIC has just gone live on Leanpub.com. Lean Publishing is a new take on a very old idea – serial publishing – which goes back to Charles Dickens in the 19th century. The idea is to evolve the book based on reader feedback, and to give people a chance to read the whole book before committing to buy.

I’ve also set up a Google Group and Facebook Page. Prior to going live with Clortex, you can read a snapshot of the documentation on its GitHub.io page.

Read online or buy the in-progress book here.

  • Nov 24 / 2013
  • 2
413px-RobertFudd17Jh
NuPIC

Book Preview: Chapter 1 – Some Context for Machine Intelligence

The following is the draft of Chapter One of my upcoming book, Real Machine Intelligence with NuPIC – Using Neuroscience to Build Truly Intelligent Machines. The book is intended as an introduction to Jeff Hawkins’ Hierarchical Temporal Memory theory, which seeks to explain in detail the principles underlying the human brain, and the open source software he’s built based on those principles. The book, aimed at the interested non-expert, will be out on Amazon in early December. You might like to read the Introduction first.

413px-RobertFudd17Jh

This book is about a new theory of how the brain works, and a piece of software which uses this theory to solve real-world problems intelligently in the same way that the brain does. In order to understand both the theory and the software, a little context is useful. That’s the purpose of this chapter.

Before we start, it’s important to scotch a couple of myths which surround both Artificial Intelligence (AI) and Neuroscience.

The first myth is that AI scientists are gradually working towards a future human-style intelligence. They’re not. Despite what they tell us (and they themselves believe), what they are really doing is building computer programs which merely appear to behave in a way which we might consider “smart” or “intelligent” as long as we ignore how they work. Don’t get me wrong, these programs are very important in our understanding of what constitutes intelligence, and they also provide us with huge improvements in understanding the nature and structure of problems solved by brains. The difficulty is that brains simply don’t work the way computer programs do, and there is no reason to believe that human-style intelligence can be approached just by adding more and more complex computer programs.

The other myth is that Neuroscience has figured out how our brains work. Neuroscience has collected an enormous amount of data about the brain, and there is good understanding of some detailed mechanisms here and there. We know (largely) how individual cells in the brain work. We know that certain regions of the brain are responsible for certain functions, for example, because people with damage there exhibit reduced efficiency in particular tasks. And we know to some extent how many of the pieces of the brain are connected together, either by observing damaged brains or by using modern brain-mapping technologies. But there is no systematic understanding which could be called a Theory of Neuroscience, one which explains the working of the brain in detail.

In order to understand how traditional AI does not provide a basis for human-like intelligence, let’s take a look inside a digital computer.

A computer chip contains a few billion very simple components called transistors. Transistors act as a kind of switch, in that they can allow a signal through or not, based on a control signal sent to them. Computer chip, or hardware, designers produce detailed plans for how to combine all these switches to produce the computer you’re reading this on. Some of these transistors are used to produce the logic in the computer, making decisions and performing calculations according to a program written by others: software engineers. The program, along with the data the program uses, are stored in yet more chips – the memory – using transistors which are either on or off. The on or off state of these memory “bits” comprise a code which stands for data – whether numbers, text, image pixels, or program codes which instruct the computer what instruction to perform at a particular time.

If you open up a computer, you can clearly see the different parts. There’s a big chip (usually with a fan on top to cool it), called the Central Processing Unit or CPU, which is where the hardware logic is housed. Separate from this, a bank of many smaller chips houses the Random Access Memory (RAM) which is the fastest kind of memory storage. There will also be either a hard disk (HD) or a solid state disk (SSD, a kind of chip-based long-term memory, faster than a HD, bigger but slower than RAM) which is where all your bulk data (programs, documents, photos, music and video) is stored for use by the computer. When your computer is running, the CPU is constantly fetching data from the memory and disks, doing some work on it, and writing the results back out to storage.

Computers have clearly changed the world. With these magical devices, we can calculate in one second with a spreadsheet program what would have taken months or years to do by hand. We can fly unflyable aircraft. We can predict the weather 10 days ahead. We can create 3D movies in high definition. We can, using other electronic “senses”, observe the oxygen and sugar consumption inside our own brains, and create a “map” of what’s happening when we think.

We write programs for these computers which are so well thought out that they appear to be “smart” in some way. They look like they’re able to out-think us; they look like they can be faster on the draw. But it turns out that they’re only good at certain things, and they can only really beat us at those things. Sure, they can calculate how to fly through the air and get through anti-aircraft artillery defences, or they can react to other computer programs on the stock exchange. They seem to be “superhuman” in some way, yet the truth is that there is no “skill” involved, no “knowledge” or “understanding” of what they’re doing. Computer programs don’t learn to do these amazing things, and we don’t teach them. We must provide exhaustive lists of absolutely precise instructions, detailing exactly what to do at any moment. The programs may appear to behave intelligently, but internally they are blindly following the scripts we have written for them.

The brain, on the other hand, cannot be programmed, and yet we learn a million things and acquire thousands of skills during our lives. We must be doing it some other way. The key to figuring this out is to look in some detail at how the brain is put together and how this structure creates intelligence. And just like we’ve done with a computer, we will examine how information is represented and processed by the structures in the brain. This examination is the subject of Chapter Two. Meanwhile, let’s have a quick look at some of the efforts people have made to create an “artificial brain” over the past few decades.

Artificial Intelligence is a term which was coined in the early 1950′s, but people have been thinking about building intelligent machines for over two thousand years. This remained in the realm of fantasy and science fiction until the dawn of the computer age, when machines suddenly became available which could provide the computational power needed to build a truly intelligent machine. It is fitting that some of the main ideas about AI came from the same legendary intellects behind the invention of digital computers themselves: Alan Turing and John von Neumann.

Turing, who famously helped to break the Nazi Enigma codes during WWII, theorised about how a machine could be considered intelligent. As a thought experiment, he suggested a test involving a human investigator who is communicating by text with an unknown entity – either another human or a computer running an AI program. If the investigator is unable to tell whether he is talking to a human or not, then Turing considers the computer to have passed his test and must be regarded as “intelligent” by this definition. This became known as the Turing Test and has unfortunately become a kind of Holy Grail for AI researchers for more than sixty years.

Meanwhile, the burgeoning field of AI attracted some very smart people, who all dreamed of soon becoming the designer of a machine one could talk to and which could help one solve real-world problems. All sorts of possibilities seemed within easy reach, and so the researchers often made grand claims about what was “just around the corner” for their projects. For instance, one of the “milestones” would be a computer which could beat the World Chess Champion, a goal which was promised “within 5 years” every year since the mid-50s, and which was only achieved in the 21st century using a huge computer and a mixture of “intelligent” and “brute-force” techniques, none of which resembled how Gary Kasparov’s brain worked.

Everyone recognised early on that intelligence at the level of the Turing Test would have to wait, so they began by trying to break things down into simpler, more achievable tasks. Having no clue about how our brains and minds worked as machines, they decided instead to theorise about how to perform some of the tasks which we can perform. Some of the early products included programs which could play Noughts and Crosses (tic-tac-toe) and Draughts (checkers), programs which could “reason” about placing blocks on top of other blocks (in a so-called micro-world), and a program called Eliza which used clever and entertaining tricks to mimic a psychiatrist interviewing a patient.

Working on these problems, developing all these programs, and thinking about intelligence in general has had profound effects beyond Computer Science in the last sixty years. Our understanding of the mind as a kind of computer or information processor is directly based on the knowledge and understanding gained from AI research. We have AI to thank for Noam Chomsky’s foundational Universal Grammar, and the field of Computational Linguistics is now required for anyone wishing to understand linguistics and human language in general. Brain surgeons use the computational model of the brain to identify and assess birth defects, the effects of disease and brain injuries, all in terms of the “functional modules” which might be affected. Cognitive psychology is now one of the basic ways to understand the way that our perceptions and internal processes operate. And the list goes on. Many, many fields have benefited indirectly from the intense work of AI researchers since 1950.

However, traditional AI has failed to live up to even its own expectations. At every turn, it seems that the “last 10%” of the problem is bigger than the first 90%. A lot of AI systems require vast amounts of programmer intelligence and do not genuinely embody any real intelligence themselves. Many such systems are incapable of flexibly responding to new contexts or situations, and they do not learn of their own accord. When they fail, they do not do so in a graceful way like we do, because they are brittle and capable only of working while “on-tracks” in some way. In short, they are nothing like us.

Yet AI researchers kept on going, hoping that some new program or some new technique would crack the code of intelligent machine design. They have built ever-more-complex systems, accumulated enormous databases of information, and employed some of the most powerful hardware available. The recent triumphs of Deep Blue (beating Kasparov at chess) and Watson (winning at the Jeopardy quiz game) have been the result of combining huge, ultra-fast computers with enormous databases and vast, complex, intricate programs costing tens of millions of dollars. While impressive, neither of these systems can do anything else which could be considered intelligent without reinvesting similar resources in the development of those new programs.

It seems to many that this is leading us away from true machine intelligence, not towards it. Human brains are not running huge, brittle programs, nor consulting vast databases of tabulated information. Our brains are just like those of a mouse, and it seems that we differ from mice only in the size and number of pieces (or regions) of brain tissue, and not in any fundamental way.

It appears very likely that intelligence is produced in the brain by the clever arrangement of brain regions, which appear to organise themselves and learn how to operate intelligently. This can be proven in the lab, when experimenters cut connections, shut down some regions, breed mutants and so on. There is very little argument in Neuroscience that this is how things work. The question then is: how do these regions work in detail? What are they doing with the information they are processing? How do they work together? If we can answer these questions, it is possible that we can both learn how our brains work and build truly intelligent machines.

I believe we can now answer these questions. That’s what this book claims to be about, after all!

  • Nov 13 / 2013
  • 0
NuPIC-Book-Cover
NuPIC

Book Preview: Introduction to “Real Machine Intelligence with NuPIC”

The following is the (draft) Introduction to my upcoming book, Real Machine Intelligence with NuPIC - Using Neuroscience to Build Truly Intelligent Machines. The book is intended as an introduction to Jeff Hawkins’ Hierarchical Temporal Memory theory, which seeks to explain in detail the principles underlying the human brain, and the open source software he’s built based on those principles. The book, aimed at the interested non-expert, will be out on Amazon in early December.

NuPIC-Book-CoverThis book is about a true learning machine you can start using today. This is not science fiction, and it’s not some kind of promised technology we’re hoping to see in the near future. It’s already here, ready to download and use. It is already being used commercially to help save energy, predict mechanical breakdowns, and keep computers running on the Internet. It’s also at the centre of a vibrant open source community with growing links to leading-edge academic and industrial research. Based on more than a decade of research and development by Jeff Hawkins and his team at Grok, NuPIC is a system built on the principles of the human brain, a theory called Hierarchical Temporal Memory (or HTM).

NuPIC stands for Numenta Platform for Intelligent Computing. On the face of it, it’s a piece of software you can download for free, do the setup, and start using right away on your own data, to solve your own problems. This book will give you the information you need to do just that. But, as you’ll learn, the software (and its usefulness to you as a product) is only a small part of the story.

NuPIC is, in fact, a working model in software of a developing theory of how the brain works, Hierarchical Temporal Memory. Its design is constrained by what we know of the structure and function of the brain. As with an architect’s miniature model, a spreadsheet in the financial realm, or a CAD system in engineering, we can experiment with and adjust the model in order to gain insights into the system we’re modelling. And, just as with those tools, we can also do useful work, solve real-world problems, and derive value from using them.

And, as with other modelling tools, we can use NuPIC as a touchstone for a growing discussion of the basic theory of what is going on inside the brain. We can compare it with all the facts and understanding from decades of neuroscience research, a body of knowledge which grows daily. We believe that the theories underlying NuPIC are the best candidates for a true understanding of human intelligence, and that NuPIC is already providing compelling evidence that these theories are valid.

This book begins with an overview of how NuPIC fits in to the worlds of Artificial Intelligence and Neuroscience. We’ll then delve a little deeper into the theory of the brain which underlies the project, including the key principles which we believe are both necessary and sufficient for intelligence. In Chapter 3, we’ll see how the design of NuPIC corresponds to these principles, and how it works in detail. Chapter 4 describes the NuPIC software at time of writing, as well as its commercial big brother, Grok. Finally, we’ll describe what the near future holds for HTM, NuPIC and Grok, and how you can get involved in this exciting work. The details of how to download and operate NuPIC are found in the Appendices, along with details of how to join the NuPIC mailing list.

  • Nov 13 / 2013
  • 0
New Column
NuPIC-Dev

Adding Prediction to the NuPIC Spatial Pooler

Jeff’s theory describes an interaction between prediction and feedforward activation, whereby a cell is partially depolarised by coincident activity on its distal (predictive) dendrites. Predictive cells get a head start when they receive feedforward inputs, and are thus most likely to fire compared with the other cells in a column, as well as non-predictive cells in neighbouring columns.

For some reason, this is not completely implemented in NuPIC. The Spatial Pooler (SP) does not take prediction into account at all, and in fact acts as if it were one big cell with a single feedforward dendrite and no distal dendrites.

I propose the following changes from the existing (diagram below, left) to the proposed (right) CLA SP design:

1. Each cell now has its own feedforward dendrite.
2. Cells in a column have identical potential inputs (fanout).
3. Permanences initialised identically for all cells in a column.
4. Potential activation for each cell is sum of predictive and feedforward potential.
5. Cell with highest total activation provides column’s activation for inhibition.
6. Same cell is chosen for activation if column becomes active.
7. Feedforward permanences will diverge depending on correlations for each cell.

New Column

 

Anticipated Advantages of this Design

1. More Accurate Neural Model

The real neocortex has a feedforward dendrite per cell. The reason the cells share similar feedforward response is that the feedforward axons pass up through the column together, so they will form similar (but not identical) synapses with all the cells in the column.

Cells in a column will all have different histories of activation, so the permanences of their synapses with a given feedforward axon will not be identical. Each cell will learn to match its own activation history with the corresponding inputs.

In the real neocortex, prediction is implemented by a partial depolarisation of the cell membrane. This lowers the amount of feedforward potential needed to fire the cell. The cells with the highest total of predictive and feedforward potential will fire first and be in the SDR for the region.

2. More Informed Spatial Pooler Selection

The current SP ignores prediction, so it does not have the additional information which the region believes, i.e. what sequence are we in, and at what position? This is a significant factor in reducing ambiguity and constraining the space of likely patterns, which the real neocortex uses universally.

In addition, each cell now gets to tune its feedforward response more precisely to its actual inputs (i.e. the ones which occur in its sequence memory). Inputs which contribute to patterns found only in other cells’ sequences will be treated as noise and disconnected. This improves the overall noise suppression and spatial recognition.

3. Easier Reconstruction of Inputs

Because each cell has its own feedforward dendrite, the permanences on that dendrite will evolve to become a statistical representation of the bits associated with that cell. This makes it easier to reconstruct the inputs from the activation pattern (or from an SDR imposed on the region from above).

The current per-column dendrite represents the collective statistics of inputs to all the cells, and thus contains noise which confuses reconstruction at present.

4. Better Anomaly Detection

The added precision in reducing ambiguity through the use of sequence information in the SP will also improve anomaly detection. The region will be more sensitive to out-of-sequence events.

Potential Downsides

1. Resource Costs

Clearly, NuPIC will have a bigger memory requirement and a longer cycle time. In return, learning and prediction will improve in both quality and accuracy, so techniques such as swarming may decide which SP to use.

2. Slow Learning

It is possible that learning will slow as a result of this change, since only one cell’s dendrite will be updated per input record, instead of updating them all in parallel as at present.

This may be mitigated by copying updates to all cells in a column for the first N input records (or the first N activations of the column). This will hopefully replicate the current SP’s ability to learn the data. After that, we switch to the new per-cell updating policy to fine-tune the permanences.

I’ve been looking through the (Python) code to find out where all the changes need to be made. Here’s what I’ve found out:

The SP doesn’t know anything about TP’s. In fact, it thinks it has only one cell per column, or that a cell and a column are the same thing. That’s why it has one feedforward dendrite per column. The TP only knows about the SDR of columns, it doesn’t see inside the SP, so it can’t see any feedforward dendrites. So, if we want to do this, we have to connect the column and its cells (at least virtually). This would happen in the Region (which owns both).

Here’s how I think we should perform the hack (in Python):

In the SP:

1. Store a value for each column which is the predictive potential for the predicting cell (if any, so usually zero). Call this array _predictivePotentials.
2. Calculate the feedforward potential as usual, store in _overlaps, but then do _overlaps += _predictivePotentials.
3. Everything else is the same (or so the SP thinks!).

In the TP: No change!

In the Region:

For the first N steps, just use the standard SP and TP to learn good per-column dendrites from the data. After that, clone the column dendrites from the SP for the cells. Call this _cellFFDendrites.

1. Take the results from the TP (the predictive cells) and get their FF dendrites from _cellFFDendrites.
2. Overwrite the dendrites in the SP for those columns with the cell dendrites. The SP is now working with the per-cell dendrite.

_saveDendrites = SP._dendrites[TP._predictiveCols]
SP._dendrites[TP._predictiveCols] = _cellFFDendrites[_predictiveCells]

3. Update the SP’s _predictivePotentials:

SP._predictivePotentials = 0
SP._predictivePotentials[TP._predictiveCols] = TP._predictiveActivity[TP._predictiveCols]

4. (On the next step, we do SP before TP) Run SP as above.

5. Copy back out the (possibly refined) dendrites to the cells.

_cellFFDendrites[_predictiveCells] = SP._dendrites[TP._predictiveCols]
SP._dendrites[TP._predictiveCols] = _saveDendrites

  • Nov 13 / 2013
  • 0
NuPIC

Welcome to InBits.com

Welcome to inbits.com, where I’ll be sharing my thoughts and experiences on technology, and in particular on developments surrounding NuPIC, a very exciting new technology for machine intelligence based on the principles of the brain.