Related papers: A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation

A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation

URL: http://arxiv.org/abs/2002.10941v1
Date: Sat, 22 Feb 2020 02:09:21 GMT
Title: A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation
Authors: Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H. Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W. Lee, Deog-Kyoon Jeong
Abstract summary: We design and architect A3, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization. Our proposed accelerator achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.
Score: 3.5217810503607896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing computational demands of neural networks, many hardware accelerators for the neural networks have been proposed. Such existing neural network accelerators often focus on popular neural network types such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs); however, not much attention has been paid to attention mechanisms, an emerging neural network primitive that enables neural networks to retrieve most relevant information from a knowledge-base, external memory, or past states. The attention mechanism is widely adopted by many state-of-the-art neural networks for computer vision, natural language processing, and machine translation, and accounts for a large portion of total execution time. We observe today's practice of implementing this mechanism using matrix-vector multiplication is suboptimal as the attention mechanism is semantically a content-based search where a large portion of computations ends up not being used. Based on this observation, we design and architect A3, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization. Our proposed accelerator achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.

Related papers

Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Design and development of opto-neural processors for simulation of neural networks trained in image detection for potential implementation in hybrid robotics [0.0]
Living neural networks offer advantages of lower power consumption, faster processing, and biological realism. This work proposes a simulated living neural network trained indirectly by backpropagating STDP based algorithms using precision activation by optogenetics.
arXiv Detail & Related papers (2024-01-17T04:42:49Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
Neural Network Quantization for Efficient Inference: A Survey [0.0]
Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks. This paper surveys the many neural network quantization techniques that have been developed in the last decade.
arXiv Detail & Related papers (2021-12-08T22:49:39Z)
WaveSense: Efficient Temporal Convolutions with Spiking Neural Networks for Keyword Spotting [1.0152838128195467]
We propose spiking neural dynamics as a natural alternative to dilated temporal convolutions. We extend this idea to WaveSense, a spiking neural network inspired by the WaveNet architecture.
arXiv Detail & Related papers (2021-11-02T09:38:22Z)
Deep physical neural networks enabled by a backpropagation algorithm for arbitrary physical systems [3.7785805908699803]
We propose a radical alternative for implementing deep neural network models: Physical Neural Networks. We introduce a hybrid physical-digital algorithm called Physics-Aware Training to efficiently train sequences of controllable physical systems to act as deep neural networks.
arXiv Detail & Related papers (2021-04-27T18:00:02Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Effective and Efficient Computation with Multiple-timescale Spiking Recurrent Neural Networks [0.9790524827475205]
We show how a novel type of adaptive spiking recurrent neural network (SRNN) is able to achieve state-of-the-art performance. We calculate a $>$100x energy improvement for our SRNNs over classical RNNs on the harder tasks.
arXiv Detail & Related papers (2020-05-24T01:04:53Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.