Neurons in a brain area known as the medial superior temporal (MST) area play a major role in visually guided navigation, as they are experts at analyzing the moving patterns of light we see when moving through the environment (called "optic flow"). Some neurons respond to a specific direction of travel ("heading"), so that we always know where we're going. However, more often than not, MST responses are complex and non-intuitive, making it hard to understand how these neurons operate.
A new study published in the Journal of Neuroscience now challenges the way we think about MST. Rather than serving specific behavioral functions, such as encoding heading, neurons in MST might simply be trying to find a compressed representation of all possible, naturally occurring optic flow patterns—such that self-motion analysis is always both accurate and efficient. Let me explain what that means, and why you should care.
Visual motion perception
As we move through the environment or see something moving, visual motion appears as structured patterns of light on our retinas (called "optic flow", see figure below) that change over time. It was Hermann von Helmholtz (1925) and James J. Gibson (1950) who first realized that these patterns of optic flow could be used to detect one's own movement in the scene, to enable perception of the shape, distance, and movement of objects in the world, and to aid the control of locomotion. Perhaps it's easiest to visualize optic flow by imagining driving through a snowstorm at night. When driving straight ahead, all snowflakes seem to come out of a single point in front of us, radially expanding in all possible directions (much like the movement depicted in the left panel of the figure below). Of course, the more complicated our movements are, the more complicated the "snowflake pattern" will get. (Also, obviously snow falls from the sky, so the actual pattern we see will not exactly be equivalent to the one shown in the figure below.) Interestingly, we now have evidence that the brain is analyzing these "snowflake patterns" so that they can be related to the movement that caused them.
As we watch an object move through the environment, the image of the object that is projected onto the back of our eyes (i.e., our retinas) moves as well (Raudies, 2013; used under CC BY). These changing patterns of light are called optic flow. For the purpose of this illustration, we can think of our eyes as cameras that take pictures of the world in short succession. If we look at two subsequent "frames" of a moving object (such as the pixelated face in the figure above), we find that the two frames are slightly different. For each "pixel" in the image, we can draw an arrow from where the pixel just was to where it is now, yielding an optic flow field. The direction of the arrow indicates the direction of motion, and its length indicates the speed of motion. Obviously, the more complex the movement, the more complex the optic flow field. The bottom row shows three different examples for flow fields that result from self-movement: when we walk straight ahead towards a wall (left), when we turn our head to the left (center), and when we both turn our head and move forward over a ground plane (right).
How neurons in the medial superior temporal (MST) area analyze self-motion
One of the brain regions involved in self-motion analysis is the dorsal subregion of what is called the medial superior temporal area (MSTd), where neurons can be found that respond to large, complex patterns of optic flow, such as the ones shown in the image above. We believe that these neurons parse the flow fields we see while moving to infer our current direction of travel—which can be used to plan paths around obstacles and whatnot—and to make it easier for other neurons to distinguish between optic flow caused by ourselves vs. optic flow caused by moving objects in the scene. (Side note: These neurons also get vestibular input from our inner ears, which bestows on us a sense of balance. The brain can infer self-motion from both vestibular and visual input—and whenever these signal don't match up, we experience motion sickness.)
Navigating through a rich natural world produces complex optic flow patterns. These patterns get more complicated in the presence of stationary and independently moving objects.
However, the story is not that simple: Most neurons respond to multiple flow patterns, typically made of distinct flow components such as translations, rotations and spirals. This has made it hard for researchers to get an intuitive understanding of what these neurons are doing exactly, and how they all work together to help us navigate our daily lives. Some researchers have suggested that neurons in MSTd act as a "heading map", where every neuron prefers a specific direction of travel (Perrone & Stone, 1994, 1998). Then, whenever the neuron that prefers (e.g.) forward motion is active, our brain would know that we are currently traveling forward. However, a problem with this idea is that the brain would require a humongous number of neurons to cover all possible directions of travel, under all possible head positions, eye positions, and scene layouts (Perrone & Stone, 1994). In addition, there are MSTd neurons with much more complicated response properties, which do not necessarily reflect a specific direction of travel (Mineault et al., 2012). These neurons might prefer a seemingly arbitrary combination of translations and rotations—a preference that might change with location (i.e., where in the visual field), head position, and eye position (i.e., if we look straight ahead or slightly to the side). What a mess...
So how can we make sense of all of this? This is where the new study comes in.
Efficient coding of optic flow can account for MSTd visual response properties
During my PhD studies with Prof. Jeff Krichmar at University of California, Irvine, we wanted to understand how these more complicated response properties of MSTd neurons could be explained. Specifically, we wanted to see if there was a simple fundamental law that could give rise to a whole range of neurophysiological data. The results just got published in the Journal of Neuroscience last week.
The idea behind this paper is that the brain must be able to analyze self-motion both efficiently and accurately. If you think about the number of ways you could walk, run, jump, crawl, somersault, twist, and turn your way through life, and if you think of all the "snowflake patterns" that correspond with these movements, it is astonishing that the brain would always know which movement corresponds to which optic flow pattern. Furthermore, it's astonishing that the brain would be able to do this with a limited number of neurons. Clearly, it would not be feasible to have a single neuron specialized for every possible movement or optic flow pattern. Instead, what if the brain found an efficient encoding of all possible optic flow patterns? What if MST is trying to find a compressed representation of optic flow, such that a small population of neuron would be sufficient to encode any possible flow pattern in a way that is both accurate and efficient?
What if MSTd neurons apply a biological version of non-negative matrix factorization (NMF) to optic flow?
NMF is a linear dimensionality reduction technique that tries to express a data matrix V as the product of two much smaller matrices W (basis vectors) and H (activation values). NMF finds representations that are often sparse and parts-based, much like the intuitive notion of combining parts to form a whole (Lee & Seung, 1999).
This is where I should tell you about non-negative matrix factorization (NMF). NMF is an algorithm that is able to break a large, high-dimensional data matrix into two smaller matrices that represent the data equally well. In other words, NMF is able to compress a large dataset by finding a small number of very telling features that best describe the data. This is exactly what we were looking for! (Of course, NMF is not the only technique that can do that. A few related dimensionality reduction techniques are principal component analysis (PCA) and independent component analysis (ICA). However, NMF has a few properties that make it uniquely suited for the problem at hand.)
NMF learns to represent faces with a set of basis images resembling parts of faces (Lee & Seung, 1999). As shown in the 7x7 montage, NMF has learned a set of 49 basis images. Positive values are illustrated with black pixels and negative values with red pixels. A particular instance of a face, shown at top right, is approximately represented by a linear superposition of basis images. The coefficients of the linear superposition are shown next to each montage, in a 7x7 grid, and the resulting superpositions are shown on the other side of the equality sign.
As an example, think of all possible faces. If you were a brain, and your job was to recognize faces, it would take forever to memorize every possible face on the planet. Instead, if you're like me, you're inherently lazy. What you want to learn instead is a general representation of a face: You want to learn that whenever you see something that has eyes, a nose, a mouth, and ears—it's probably a face. You could break the problem apart by having a neuron that can recognize noses, and another that can recognize eyes. So, if you're a brain, and you ask the nose neuron: hey did you see a nose there? and the neuron says yes, and you ask the eye neuron, and the mouth neuron, etc. and they all say yes—then you're pretty sure you're looking at a face.
The same idea can be applied to optic flow. The problem is just: What is the "nose" equivalent of a moving pattern of light? How do you know which feature to pick? Perhaps you need a few neurons that respond to forward, backward, and sideward motion. Then, when you ask the forward and the backward neuron, hey did you see stuff coming at you? and the forward neuron says yes, but the backward neuron says no—you're pretty sure you're going forward. The same game can be plaid with more complicated neurons; maybe a spiral neuron. Turns out you don't even need the forward neuron per se, and this is where previous theories of MSTd have been stuck at. You just need a small number of very distinctive feature neurons, and NMF can find them. That's it. Now it might be harder to interpret what these neurons are doing, because they don't correspond to your idealized "forward-moving" and "sideways-moving" neurons, but when you do the math, you'll see that the resulting neural code is much more efficient than anything you could have engineered by hand.
This is what MSTd seems to be doing. It's trying to find a compressed representation of optic flow—namely, of all optic flow fields you could ever encounter during self-motion. It doesn't have to experience all these flow fields, it just has to know the statistics of it. Same was true for the face example. You don't need to see every possible face before you realize that most faces have eyes and noses. Instead, there is a simple, fundamental principle according to which these neurons operate—one that would be able to explain a whole range of neurophysiological data. You don't want a neuron that can only do one thing, because then you need a neuron for every single thing. Instead you want neurons that are fairly general, maybe are best at recognizing noses, but also a little bit ears. You make neurons that are generalists, not specialists. This way your neurons are flexible, and they can accomplish pretty much anything when they act jointly in a group.
Overall model architecture. A number S of 2D flow fields depicting observer translations and rotations in a 3D world were processed by an array of F MT-like motion sensors, each tuned to a specific direction and speed of motion. MT-like activity values were then arranged into the columns of a data matrix, V, which served as input for NMF. The output of NMF was two reduced-rank matrices, W (containing B non-negative basis vectors) and H (containing hidden coefficients). Columns of W (basis vectors) were then interpreted as weight vectors of MSTd-like model units.
These findings have a few implications that extend beyond visual motion processing in the brain. They suggest that most neuronal response properties we observe might be emergent properties of the nervous system, as they might be due to an underlying computational or organizational principle. A similar story might hold for other brain areas, such as the primary visual cortex (Spratling, 2010). Interestingly, NMF has recently been linked to spike-timing dependent plasticity (STDP), a synaptic learning rule that is ubiquitous in the brain (Carlson et al., 2013). This means that NMF might actually capture something fundamental about how the brain stores complex information with a limited number of synaptic connections.
More information can be found in the following publication:
- Beyeler M, Dutt N, Krichmar JL (2016). 3D Visual Response Properties of MSTd Emerge from an Efficient, Sparse Population Code. Journal of Neuroscience 36(32):8399-8415, doi:10.1523/JNEUROSCI.0396-16.2016.
TL;DR Neurons in brain area MSTd might perform a biological equivalent of non-negative matrix factorization on their inputs, which allows them to represent all naturally occurring optic flow patterns both accurately and efficiently. This is a new way of thinking about self-motion processing in MSTd, and is a computational principle that might apply to other brain regions as well.