ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
Transformation
- URL: http://arxiv.org/abs/2309.01771v1
- Date: Mon, 4 Sep 2023 19:19:39 GMT
- Title: ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
Transformation
- Authors: Nastaran Darabi, Maeesha Binte Hashem, Hongyi Pan, Ahmet Cetin,
Wilfred Gomes, and Amit Ranjan Trivedi
- Abstract summary: This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations.
Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix.
On a 16$times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt.
- Score: 2.7488316163114823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The edge processing of deep neural networks (DNNs) is becoming increasingly
important due to its ability to extract valuable information directly at the
data source to minimize latency and energy consumption. Frequency-domain model
compression, such as with the Walsh-Hadamard transform (WHT), has been
identified as an efficient alternative. However, the benefits of
frequency-domain processing are often offset by the increased
multiply-accumulate (MAC) operations required. This paper proposes a novel
approach to an energy-efficient acceleration of frequency-domain neural
networks by utilizing analog-domain frequency-based tensor transformations. Our
approach offers unique opportunities to enhance computational efficiency,
resulting in several high-level advantages, including array micro-architecture
with parallelism, ADC/DAC-free analog computations, and increased output
sparsity. Our approach achieves more compact cells by eliminating the need for
trainable parameters in the transformation matrix. Moreover, our novel array
micro-architecture enables adaptive stitching of cells column-wise and
row-wise, thereby facilitating perfect parallelism in computations.
Additionally, our scheme enables ADC/DAC-free computations by training against
highly quantized matrix-vector products, leveraging the parameter-free nature
of matrix multiplications. Another crucial aspect of our design is its ability
to handle signed-bit processing for frequency-based transformations. This leads
to increased output sparsity and reduced digitization workload. On a
16$\times$16 crossbars, for 8-bit input processing, the proposed approach
achieves the energy efficiency of 1602 tera operations per second per Watt
(TOPS/W) without early termination strategy and 5311 TOPS/W with early
termination strategy at VDD = 0.8 V.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators [5.245727758971415]
Crossbarsoftware-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs)
arXiv Detail & Related papers (2024-07-17T07:56:43Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - Incrementally-Computable Neural Networks: Efficient Inference for
Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs.
We take an incremental computing approach, looking to reuse calculations as the inputs change.
We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Reliability-Aware Deployment of DNNs on In-Memory Analog Computing
Architectures [0.0]
In-Memory Analog Computing (IMAC) circuits remove the need for signal converters by realizing both MVM and NLV operations in the analog domain.
We introduce a practical approach to deploy large matrices in deep neural networks (DNNs) onto multiple smaller IMAC subarrays to alleviate the impacts of noise and parasitics.
arXiv Detail & Related papers (2022-10-02T01:43:35Z) - Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals [11.31429464715989]
This paper presents a new PIM architecture to efficiently accelerate deep learning tasks.
It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits.
Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
arXiv Detail & Related papers (2022-01-30T16:14:49Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - Non-Volatile Memory Array Based Quantization- and Noise-Resilient LSTM
Neural Networks [1.5332481598232224]
This paper focuses on the application of quantization-aware training algorithm to LSTM models.
We have shown that only 4-bit NVM weights and 4-bit ADC/DACs are needed to produce equivalent LSTM network performance as floating-point baseline.
Benchmark analysis of our proposed LSTM accelerator for inference has shown at least 2.4x better computing efficiency and 40x higher area efficiency than traditional digital approaches.
arXiv Detail & Related papers (2020-02-25T02:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.