## Implications of the NuPIC Geospatial Encoder

Numenta’s Chetan Surpur recently demoed and explained the details of a new encoder for NuPIC which creates Sparse Distributed Representations (SDRs) from GPS data. Apart altogether from the direct applications which this development immediately suggests, I believe that Chetan’s invention has a number of much more profound implications for NuPIC and even HTM in general. This post will explore a few of the most important of these. Chetans’ demo and a tutorial by Matt Taylor are available on Youtube. First, here is Chetan presenting to, and discussing it with, Numenta people: And here’s Matt with another excellent hands-on tutorial:

### Mechanism

I’ll begin by describing the encoder itself. The Geospatial Encoder takes as input a triple [Lat, Long, Speed] and returns a Sparse Distributed Representation (SDR) which uniquely identifies that position for the given speed. The speed is important because we want the “resolution” of the encoding to vary depending on how quickly the position is changing, and Chetan’s method does this very elegantly. The algorithm is quite simple. First, a 2D space (Lat, Long) is divided up (virtually) into squares of a given *scale* (a parameter provided for each encoder), so each square has an x and y integer co-ordinate (the Lat-Long pair is projected using a given projection scheme for convenient display on mapping software). This co-ordinate pair can then be used as a seed for a pseudorandom number generator (Python and numpy use the cross-platform Mersenne Twister MT19937), which is used to produce a real-valued *order* between 0 and 1, and a *bit* position chosen from the *n* bits in the encoding. These can be generated on demand for each square in the grid, always yielding the same results. To create the SDR for a given *position* and *speed*, the algorithm first converts the *speed* to a *radius* and forms a box of squares surrounding the *position* and calculates the pair [*order*, *bit*] for each square in the box. The top *w* squares (with the highest *order*) are chosen, and their *bit* values are used to choose the *w* active bits in the SDR.

### Initial Interpretation

The first thing to say is that this encoder is an exemplar of transforming real-world data (location in the context of movement) into a very “SDR-like” SDR. It has the key properties we seek in an SDR encoder, in that semantically similar inputs will yield highly overlapping representations. It is robust to noise and measurement error in both space and time, and the representation is both unique (given a set *scale* parameter) and reproducible (given a choice of cross-platform random number generator), independently of the order of presentation of the data. The reason for this “SDR-style” character is that the entire space of squares forms an infinite field of “virtual neurons”, each of which has some activation value (its *order*) and position in the input bit vector (its *bit*). The algorithm first sparsifies this representation by restricting its sampling subspace to a box of squares around the position, and then enforces the exact sparseness by picking the *w* squares using a competitive analogue of local inhibition.

### Random Spatial Neuron Field (Spatial Retina)

This idea can be generalised to produce a “spatial retina” in n-dimensional space which provides a (statistically) unique SDR fingerprint for every point in the space. The SDRs specialise (or zoom in) when you reduce the *radius* factor, and generalise (or zoom out) when *radius* is increased. This provides a distance metric between two points which involves the interplay of spatial zoom and the fuzziness of overlap. Any two points will have identical SDRs (*w* bits of overlap) if you increase the *radius* sufficiently, and entirely disparate SDRs (0 bits overlap) if you zoom in sufficiently (down to the order of *w*scale*). Since the Coordinate Encoder operates in a world of integer-indexed squares, we first need to transform each dimension using its own *scale* parameter (the Geospatial Encoder uses the same *scale* for each direction, but this is not necessary). We thus have a single, efficient, simple mechanism which allows HTM to navigate in any kind of spatial environment. This is, I believe a really significant invention which has implications well beyond HTM and NuPIC. As Jeff and others mentioned during Chetan’s talk, this may be the mechanism underlying some animals’ ability to navigate using the Earth’s magnetic field. It is possible to envisage a (finite, obviously) field of real neurons which each have a unique response to position in the magnetic field. Humans have a similar ability to navigate, using sensory input to provide an activation pattern which varies over space and identifies locations. We combine whichever modalities work best (blind people use sound and memories of movement to compensate for impaired vision), and as long as the pipeline produces SDRs of an appropriate character, we can now see how this just works.

### Comparison with Random Distributed Scalar Encoder (RDSE)

The Geospatial Encoder uses the more general Coordinate Encoder, which takes a n-dimensional integer vector and a radius, and produces the corresponding SDR. It is easy to see how a 1D spatial encoder with a fixed *speed* would produce an SDR for arbitrary scalars, given an initial *scale* which would decide the maximum resolution of the encoder. This encoder would be an improved replacement for the RDSE, with the following advantages:

- When encoding a value, the RDSE needs to encode all the values between existing encodings and the new value (so that the overlap guarantees are honoured). A 1D-Geo encoder can compute each value independently, saving significantly in time and memory footprint.
- In order to produce identical values for all inputs regardless of the order of presentation, the RDSE needs to “precompute” even more values in batches around a fixed “centre” (eg to compute f(23) starting at 0, we might have to compute [f(-30),...,f(30)]). Again, 1D-Geo scalar encoding computes each value uniquely and independently.
- Assuming
*scale*(which decides the max resolution) is fixed, the 1D-Geo scalar encoding can compute encodings of variable resolution with semantic degradation by varying*speed*. The SDR for a value is exactly unique for the same*speed*, but changes gradually as*speed*is increased or decreased. The RDSE has no such property.

This would strongly suggest that we can replace the RDSE with a 1D coordinate spatial encoder in NuPIC, and get all the above benefits without any compromise.

### Combination with Spatially-varying Data

It is clear how you could combine this encoding scheme with data which varies by location, to create a richer idea of “order” in feeding the SDR generation algorithm. For example, you could combine random “order” with altitude or temperature data to choose the top *w* squares. Alternatively, the pure spatial bit signature of a location may be combined in parallel with the encoded values of scalar quantities found at the current location, so that a HTM system associatively learns the spatial structure of the given scalar field.

### Spatially Addressed Memory

The Geospatial Encoder computes a symbolic SDR address for a spatial location, effectively a “name” or “word” for each place. The elements or alphabet of this encoding are simply random *order* activation values of nearby squares, so any more “real” semantic SDR-like activation pattern will do an even better job in computing spatial addresses. We use memories of spatial cues (literally, landmarks), emotional memories, maps, memories of moving within the space, textual directions, and so on to encode and reinforce these representations. This model explains why memory experts often use Memory Palaces (aka the Method of Loci) to remember long sequences of data items. They associate each item (or an imagined, memorable visual proxy) occupying a location in a very familiar spatial environment. It also explains the existence of “place neurons” in rodent hippocampi – these neurons are each participating in generating a spatial encoding similar in character to the Geospatial Encoder.

### Zooming, Panning and Attention

This is a wonderful model for how we “zoom in” or “zoom out” and perceive a continuously but smoothly varying model of the world. It also models how we can perceive gracefully degrading levels of detail depending on how much time or attention we pay for a perception. In this case, the “encoder” detailed here would be a subcortical structure or a thalamus-gated (attention controlled) input or relay between regions. If we could find a mechanism in the brain which controls the size and position of a “window” of signals (akin to our variable box of squares), we would have a candidate for our ability to use attention to control spatial resolution and centre of focus. Such a mechanism may automatically arise from preferentially gating neurons at the edges of a “patch”, by virtue of the inhibition mechanism’s ability to smoothly alter the representation as inputs are added or removed. This mechanism would also explain boundary extension error, in which we “fill out” areas surrounding the physical boundaries of objects and images. As explained in detail in her talk at the Royal Institute, Eleanor Maguire believes that the hippocampus is crucial for both this phenomenon and our ability to navigate in real space. As one of the brain components at the “top” of the hierarchies, the hippocampus may be the place where we can perform the crucial “zooming and panning” operations and where we manipulate spatial SDRs as suggested by the current discovery.

### Implementation Details

The coordinate encoder has a deterministic, O(1), order-independent algorithm for computing both “order” and bit choice. One important issue is that the pseudorandom number is Python-specific, and so a Java encoder (which uses a different pseudorandom number generator) will produce completely different answers. The solution is to use the Python (and numpy) RNG, which is the Mersenne Twister MT19937, also used by default in numerous other languages. I believe it would be worth exploring using Perlin noise to generate the order and bit choice values. This would give you a) identical encodings across platforms, b) pseudorandom, uncorrelated values when the noise samples are far enough apart (eg when the inputs are integers as in this case), and c) smoothly changing values if you use very small step sizes. Just one point about changing radius and its effect on the encoding. I’m very confident that the SDR is very robust to changes in radius, due to the sparsity of the SDRs. In other words, the overlap in an SDR at radius r with that at radius r’ (at the same GPS position) will be high, because you are only adding or removing an annulus around the same position (this will be similar to adding or removing a strip of squares when a small position change occurs).

### Links to the Demo and Encoder Code

Chetan’s demo code (which is really comprehensive) is at https://github.com/numenta/nupic.geospatial. The Geospatial Encoder code is at https://github.com/numenta/nupic/blob/master/nupic/encoders/geospatial_coordinate.py and the Coordinate Encoder is at https://github.com/numenta/nupic/blob/master/nupic/encoders/coordinate.py.