Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks
- URL: http://arxiv.org/abs/2309.11048v1
- Date: Wed, 20 Sep 2023 03:52:04 GMT
- Title: Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks
- Authors: Nastaran Darabi, and Amit R. Trivedi
- Abstract summary: This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Edge computing is a promising solution for handling high-dimensional,
multispectral analog data from sensors and IoT devices for applications such as
autonomous drones. However, edge devices' limited storage and computing
resources make it challenging to perform complex predictive modeling at the
edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize
energy for deep learning-based inference at the edge. Nevertheless, integrating
storage and processing complicates memory cells and/or memory peripherals,
essentially trading off area efficiency for energy efficiency. This paper
proposes a novel solution to improve area efficiency in deep learning inference
tasks. The proposed method employs two key strategies. Firstly, a Frequency
domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the
necessary parameters for DNN (by 87% in MobileNetV2) and enabling
compute-in-SRAM, which better utilizes parallelism during inference. Secondly,
a memory-immersed collaborative digitization method is described among CiM
arrays to reduce the area overheads of conventional ADCs. This facilitates more
CiM arrays in limited footprint designs, leading to better parallelism and
reduced external memory accesses. Different networking configurations are
explored, where Flash, SA, and their hybrid digitization steps can be
implemented using the memory-immersed scheme. The results are demonstrated
using a 65 nm CMOS test chip, exhibiting significant area and energy savings
compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing
analog data more efficiently, it is possible to selectively retain valuable
data from sensors and alleviate the challenges posed by the analog data deluge.
Related papers
- SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
Spiking Neural Networks (SNNs) offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS)
Existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes.
We propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator chipname with a set of key features.
arXiv Detail & Related papers (2024-11-05T06:59:02Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
Transformation [2.7488316163114823]
This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations.
Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix.
On a 16$times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt.
arXiv Detail & Related papers (2023-09-04T19:19:39Z) - Memory-Immersed Collaborative Digitization for Area-Efficient
Compute-in-Memory Deep Learning [2.9812721676061127]
This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference.
Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization.
arXiv Detail & Related papers (2023-07-07T23:33:22Z) - DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN
Training and Inference [4.718504401468233]
PIM solutions rely either on novel memory technologies that have yet to mature or bit-serial computations that have significant performance overhead and scalability issues.
Our work proposes an in-SRAM digital multiplier, that uses a conventional memory to perform bit-parallel computations, leveraging multiple wordlines activation.
We then introduce DAISM, an architecture leveraging this multiplier, which achieves up to two orders of magnitude higher area efficiency compared to the SOTA counterparts, with competitive energy efficiency.
arXiv Detail & Related papers (2023-05-12T10:58:21Z) - Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals [11.31429464715989]
This paper presents a new PIM architecture to efficiently accelerate deep learning tasks.
It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits.
Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
arXiv Detail & Related papers (2022-01-30T16:14:49Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled.
This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit.
We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.