Neural networks without matrix math

The challenge of accelerating AI systems usually means adding more processing elements and pruning algorithms, but these approaches are not the only way forward.

Almost all commercial machine learning applications depend on artificial neural networks, which are trained using large datasets with a backpropagation algorithm. The network first analyzes a learning example, usually assigning it to a classification bin. This result is compared to the known “correct” answer, and the difference between the two is used to adjust the weights applied to the network nodes.

The process repeats for as many training examples as necessary to (hopefully) converge to a stable set of weights that yields acceptable accuracy. This standard algorithm requires two separate computational paths: a forward “inference” path to analyze the data and a backward “gradient descent” path to correct node weights.

In biological brains, the strength of synaptic connections increases and decreases as associated neurons fire (or fail), but there is no evidence for a separate synaptic updating process. Critics of backpropagation argue that it is biologically implausible for this reason. Jack Kendall, co-founder and CTO of Neuromorphic Rainstated that errors also accumulate during backpropagation, which hurts overall performance.

Nonetheless, a research and development thread in the AI ​​community is looking to implement backpropagation algorithms more efficiently. This can be done by using less precise weights, dedicated speed-up chips, or devices that allow more network nodes to fit in a given circuit footprint.

Another thread argues that the backpropagation approach is inherently limited. Training neural networks is time consuming and expensive. In particular, the need for large training sets with pre-tagged data is particularly problematic for applications such as autonomous vehicles, which must be able to adapt to the environment in real time. From this point of view, new advances require new learning models and new training algorithms.

Spike neural networks are a frequently discussed alternative, and time-dependent plasticity of spikes is often proposed as a learning rule. Spike-based approaches seek to model the dynamics of learning in biological brains, with chains of signal spikes corresponding to incoming stimuli.

Finding Balanced Answers
Electrical circuits are not biological neurons, however. They have different physics and face different engineering constraints. They can also rely on an existing library of well-characterized circuit elements, both analog and digital.

Kendall explained that her company’s new machine learning paradigm, propagation at equilibrium, is based on a reformulation of Kirchoff’s law. Equilibrium propagation defines an “energy” function in terms of the nodes of a neural network. Physically, this “total energy”, F, is a measure of the total network pseudo-power. It is the sum of two terms, E and C. E is a measure of internal interactions between nodes, while C measures the difference between the target and actual network output values, weighted by a parameter β.

The variation of the total energy with time is defined by the evolution of a state variable, s:

The output of the model is given by the components of E at the fixed point

“Solving” the model means identifying the components of E – the values ​​of the network nodes – that minimize F. A “good” solution is one in which this configuration of the values ​​of the nodes also produces the classification bin expected by the data of learning.

Yoshua Bengio, Turing Prize winner and founder of Mila, the Quebec Institute for Artificial Intelligence, said equilibrium propagation does not depend on computation in the sense of matrix operations that characterize conventional neural networks. Rather, the network “learns” through a series of Ising like a model annealing steps. A solution is found by first setting β to 0, allowing the network to relax to a fixed point, and measuring the resulting “free” output values.

Then, in the second “push” phase, a small change in β pushes the observed output values ​​in the direction of the target values. Disturbing the outputs changes the dynamics of the system – it is no longer in equilibrium – and the network is allowed to relax to a new fixed point, with new values ​​of E. A mathematically rigorous treatment shows that network relaxation corresponds to the propagation of error derivatives in conventional backpropagation, and that repeated adjustments result in stochastic gradient descent.

Rather than providing an explicit prediction as conventional algorithms do, the model produces an implicit result defined by the components of E. Although the theory underlying equilibrium propagation is applicable to any nonlinear resistive network, implementing it with digital hardware requires additional steps. To obtain an explicit solution, a numerical architecture would need to numerically optimize the energy function.

Analog hardware for analog solutions
Instead, Gordon Wilson, CEO of Rain Neuromorphics, pointed to memristor development as the key to implementing equilibrium propagation in commercially interesting analog networks. The architecture proposed by the company stores node values ​​in arrays of memristor elements, whose conductances act as synaptic weights. After each “push” phase, voltage or current pulses modify the conductances.

Pairs of diodes, each followed by a linear amplifier, act as “neurons” to transfer values ​​between the layers. “Bi-directional amplifiers” use voltage sources to prevent signal decay between input and output nodes, while current sources provide propagation of reverse error correction signals.

Although the simulation results are promising, the actual implementation of such a network in hardware still poses additional challenges. In particular, device researchers are still learning how to achieve reliable conductance changes in memristor arrays. Still, Kendall said the equilibrium propagation approach applies mathematical techniques from electronics directly to neural network problems, simplifying both programming and circuit design.

Related stories
Compilation and optimization of neural networks
Inference with lower power and improved performance.
State-of-the-art neural networks place data in time
How effectively can we mimic the advanced biological process of neurons and synapses, and is CMOS a good choice for neural networks?
Spiking Neural Networks: research projects or commercial products?
Opinions vary widely, but in this space, that’s not unusual.
Scaling in-memory compute accelerators
New research points to progress and problems in a post-von Neumann world.
In-Memory Compute Accelerators High-End Network Design Compromise
Change in computing paradigm as more data needs to be processed faster.
Are better machine training approaches coming?
Why unsupervised, reinforcement, and Hebbian approaches are good for some things, but not others.
Focus on biological informatics
Accelerating AI inference
Understand what is important to make trade-offs.

About Florence L. Silvia

Check Also

How to Stream Every ‘Matrix’ Movie in 2022

ex_artist/Shutterstock.com The Wachowskis created a revolution in blockbuster action when they created a genre-blending adventure …