Researchers at MIT’s Computing and Artificial Intelligence Laboratory (CSAIL) have developed an open-source Multiply-ADDitioN-lESS (MADDNESS) algorithm, which accelerates machine learning using approximate matrix multiplication (AMM) . MADDNESS does not require any multiplication-addition operation and performs 10 times faster than other approximate methods and 100 times faster than exact multiplication.

The team described MADDNESS and a set of experiments in an article presented at the recent International Conference on Machine Learning (ICML). Unlike many other AMM algorithms, MADDNESS does not use any multiply-add operation. Instead, MADDNESS uses a set of efficient learned hash functions that achieve encode rates of 100 GB / second using a single CPU thread. Hash functions map the input data to an index in a lookup table that contains precomputed dot products. Although the algorithm introduces some output error, the authors show that the algorithm has a theoretical upper bound on the error, which can be compensated for by speed. In a series of experiments comparing the performance of MADDNESS to other algorithms in an image classifier, the researchers found that MADDNESS offered a better speed-precision tradeoff and achieved “virtually the same precision as exact multiplication” while being 10 times faster.

Matrix multiplication is a fundamental operation in machine learning and one of the most time consuming, due to the heavy use of multiply-add instructions. Since GPU and TPU chips can perform many multiple additions in parallel, they perform matrix multiplication faster than CPUs, making them attractive for ML applications; however, they can be too expensive or even unavailable for researchers on a budget, or in resource-limited applications like IoT. Thus, many researchers have studied AMM algorithms, which trade the precision of matrix multiplication for speed.

The MADDNESS algorithm makes several assumptions about multiplied matrices: “large, relatively dense, and residing in single machine memory”, which occur in a wide variety of machine learning applications. In particular, a matrix contains fixed values; this can represent the weights in an image classifier model. The other matrix represents the input data; for example, the pixels of an image to classify. MADDNESS is based on a vector quantization algorithm called product quantification (PQ). In PQ, a large set of input vectors is analyzed to create a small number of *prototype* vectors. The products of each prototype vector with the fixed weight vector are pre-calculated. Then, any new input vector is mapped to its most similar prototype using a *hash function*, which gives an index to a pre-calculated product.

The key innovation with MADDNESS uses a preprocessing step to produce very fast hash functions that do not require multiply-add operations. The functions are based on binary regression trees; each tree has 16 leaves which represent buckets of hashish. Mapping from an input vector to a prototype only requires a comparison with the threshold values of the divisions of the tree. MADDNESS includes two additional performance enhancements: a prototype optimization that chooses a set of prototypes that minimizes the error in rebuilding the original array, and a fast 8-bit aggregation that combines multiple products using computational instructions from material-specific average instead of an add-on.

The researchers compared MADDNESS to six other AMM algorithms, including Principal Component Analysis (PCA) and Bolt, their own previous AMM algorithm, as well as exact matrix multiplication using BLAS. The AMM algorithms were used as part of an image classifier trained and evaluated on the ICRA datasets. MADDNESS has outperformed all other methods, achieving a “much better speed-precision compromise”. The team also compared MADDNESS to other algorithms using kernel classifiers trained on UCR Time Series Archive datasets; in these experiments MADDNESS was “significantly faster than the alternatives at a given level of precision”.

Lead author Davis Blalock joined a discussion on Reddit to answer questions on the MADDNESS article. Asked about extending the algorithm to nonlinear functions, Blalock clarified:

Work on its extension to other linear functions (eg convolution) and intelligently exchange linear operations with a global neural network. So in the sense that neural networks are nonlinear functions, yes. Do not work on the direct approximation of nonlinearities as they are inexpensive to simply apply to the output of linear operations (especially if you are just writing a merged kernel that does both operations at the same time).

The source code for MADDNESS is available on GitHub.