Related papers: ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation

ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation

URL: http://arxiv.org/abs/2309.01771v1
Date: Mon, 4 Sep 2023 19:19:39 GMT
Title: ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
Authors: Nastaran Darabi, Maeesha Binte Hashem, Hongyi Pan, Ahmet Cetin, Wilfred Gomes, and Amit Ranjan Trivedi
Abstract summary: This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations. Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix. On a 16$times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt.
Score: 2.7488316163114823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The edge processing of deep neural networks (DNNs) is becoming increasingly important due to its ability to extract valuable information directly at the data source to minimize latency and energy consumption. Frequency-domain model compression, such as with the Walsh-Hadamard transform (WHT), has been identified as an efficient alternative. However, the benefits of frequency-domain processing are often offset by the increased multiply-accumulate (MAC) operations required. This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations. Our approach offers unique opportunities to enhance computational efficiency, resulting in several high-level advantages, including array micro-architecture with parallelism, ADC/DAC-free analog computations, and increased output sparsity. Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix. Moreover, our novel array micro-architecture enables adaptive stitching of cells column-wise and row-wise, thereby facilitating perfect parallelism in computations. Additionally, our scheme enables ADC/DAC-free computations by training against highly quantized matrix-vector products, leveraging the parameter-free nature of matrix multiplications. Another crucial aspect of our design is its ability to handle signed-bit processing for frequency-based transformations. This leads to increased output sparsity and reduced digitization workload. On a 16$\times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt (TOPS/W) without early termination strategy and 5311 TOPS/W with early termination strategy at VDD = 0.8 V.

Related papers

NeuMatC: A General Neural Framework for Fast Parametric Matrix Operation [75.91285900600549]
We propose textbftextitNeural Matrix Computation Framework (NeuMatC), which elegantly tackles general parametric matrix operation tasks.<n>NeuMatC unsupervisedly learns a low-rank and continuous mapping from parameters to their corresponding matrix operation results.<n> Experimental results on both synthetic and real-world datasets demonstrate the promising performance of NeuMatC.
arXiv Detail & Related papers (2025-11-28T07:21:17Z)
DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift [56.04579258267126]
This paper investigates maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths.<n>A reconfigurable intelligent surface (RIS) is employed to enhance transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects.<n>Deep neural network (DNN) is developed to facilitate faster codeword selection.
arXiv Detail & Related papers (2025-07-03T17:35:06Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators [5.245727758971415]
Crossbarsoftware-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs)
arXiv Detail & Related papers (2024-07-17T07:56:43Z)
Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z)
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. We take an incremental computing approach, looking to reuse calculations as the inputs change. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z)
RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Reliability-Aware Deployment of DNNs on In-Memory Analog Computing Architectures [0.0]
In-Memory Analog Computing (IMAC) circuits remove the need for signal converters by realizing both MVM and NLV operations in the analog domain. We introduce a practical approach to deploy large matrices in deep neural networks (DNNs) onto multiple smaller IMAC subarrays to alleviate the impacts of noise and parasitics.
arXiv Detail & Related papers (2022-10-02T01:43:35Z)
Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals [11.31429464715989]
This paper presents a new PIM architecture to efficiently accelerate deep learning tasks. It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits. Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
arXiv Detail & Related papers (2022-01-30T16:14:49Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
Non-Volatile Memory Array Based Quantization- and Noise-Resilient LSTM Neural Networks [1.5332481598232224]
This paper focuses on the application of quantization-aware training algorithm to LSTM models. We have shown that only 4-bit NVM weights and 4-bit ADC/DACs are needed to produce equivalent LSTM network performance as floating-point baseline. Benchmark analysis of our proposed LSTM accelerator for inference has shown at least 2.4x better computing efficiency and 40x higher area efficiency than traditional digital approaches.
arXiv Detail & Related papers (2020-02-25T02:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.