Related papers: NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

URL: http://arxiv.org/abs/2305.14405v4
Date: Tue, 20 Aug 2024 11:45:34 GMT
Title: NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
Authors: Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Jianzhe Lin, Yiran Li, An Zou,
Abstract summary: We introduce NeuralMatrix, which elastically transforms the computations of entire deep neural network (DNN) models into linear matrix operations. Experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models. This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.
Score: 20.404864470321897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power consumption, especially when the hardware processor needs to support and execute different neural networks. In this study, we introduce NeuralMatrix, which elastically transforms the computations of entire DNNs into linear matrix operations. This transformation allows seamless execution of various DNN models all with matrix operations and paves the way for running versatile DNN models with a single General Matrix Multiplication (GEMM) accelerator.Extensive experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models, achieving 2.17-38.72 times computation efficiency (i.e., throughput per power) compared to CPUs, GPUs, and SoC platforms. This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.

Related papers

OMENN: One Matrix to Explain Neural Networks [2.397390211883228]
One Matrix to Explain Neural Networks (OMENN) is a novel post-hoc method that represents a neural network as a single, interpretable matrix for each specific input. We present a theoretical analysis of OMENN based on dynamic linearity property and validate its effectiveness with extensive tests on two XAI benchmarks.
arXiv Detail & Related papers (2024-12-03T11:49:01Z)
Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Training Integer-Only Deep Recurrent Neural Networks [3.1829446824051195]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions. The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
arXiv Detail & Related papers (2022-12-22T15:22:36Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
iRNN: Integer-only Recurrent Neural Network [0.8766022970635899]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
arXiv Detail & Related papers (2021-09-20T20:17:40Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
An Alternative Practice of Tropical Convolution to Traditional Convolutional Neural Networks [0.5837881923712392]
We propose a new type of CNNs called Tropical Convolutional Neural Networks (TCNNs) TCNNs are built on tropical convolutions in which the multiplications and additions in conventional convolutional layers are replaced by additions and min/max operations respectively. We show that TCNN can achieve higher expressive power than ordinary convolutional layers on the MNIST and CIFAR10 image data set.
arXiv Detail & Related papers (2021-03-03T00:13:30Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network. It leads to both energy-efficient inference and training, without compromising expressive capacity. ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs. BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.