A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End
Inference of Real-World Deep Neural Networks
- URL: http://arxiv.org/abs/2201.01089v1
- Date: Tue, 4 Jan 2022 11:12:01 GMT
- Title: A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End
Inference of Real-World Deep Neural Networks
- Authors: Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan
Karunaratne, Irem Boybat, Luca Benini and Davide Rossi
- Abstract summary: Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency.
Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference.
We present a heterogeneous tightly-coupled architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators.
- Score: 12.361842554233558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deployment of modern TinyML tasks on small battery-constrained IoT devices
requires high computational energy efficiency. Analog In-Memory Computing (IMC)
using non-volatile memory (NVM) promises major efficiency improvements in deep
neural network (DNN) inference and serves as on-chip memory storage for DNN
weights. However, IMC's functional flexibility limitations and their impact on
performance, energy, and area efficiency are not yet fully understood at the
system level. To target practical end-to-end IoT applications, IMC arrays must
be enclosed in heterogeneous programmable systems, introducing new system-level
challenges which we aim at addressing in this work. We present a heterogeneous
tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory
computing accelerator (IMA), and digital accelerators. We benchmark the system
on a highly heterogeneous workload such as the Bottleneck layer from a
MobileNetV2, showing 11.5x performance and 9.5x energy efficiency improvements,
compared to highly optimized parallel execution on the cores. Furthermore, we
explore the requirements for end-to-end inference of a full mobile-grade DNN
(MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous
architecture to a multi-array accelerator. Our results show that our solution,
on the end-to-end inference of the MobileNetV2, is one order of magnitude
better in terms of execution latency than existing programmable architectures
and two orders of magnitude better than state-of-the-art heterogeneous
solutions integrating in-memory computing analog cores.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Full-Stack Optimization for CAM-Only DNN Inference [2.0837295518447934]
This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors.
We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity.
Our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators.
arXiv Detail & Related papers (2024-01-23T10:27:38Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand
Edge Resource [25.274288063300844]
Deep Neural Networks (DNNs) have significantly improved the accuracy of intelligent applications on mobile devices.
DNN surgery can enable real-time inference despite the computational limitations of mobile devices.
This paper introduces a novel Decentralized DNN Surgery (DDS) framework.
arXiv Detail & Related papers (2023-06-21T11:32:28Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Heterogeneous Data-Centric Architectures for Modern Data-Intensive
Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications.
In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Towards Memory-Efficient Neural Networks via Multi-Level in situ
Generation [10.563649948220371]
Deep neural networks (DNN) have shown superior performance in a variety of tasks.
As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices.
We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
arXiv Detail & Related papers (2021-08-25T18:50:24Z) - Efficiency-driven Hardware Optimization for Adversarially Robust Neural
Networks [3.125321230840342]
We will focus on how to address adversarial robustness for Deep Neural Networks (DNNs) through efficiency-driven hardware optimizations.
One such approach is approximate digital CMOS memories with hybrid 6T-8T cells that enable supply scaling (Vdd) yielding low-power operation.
Another memory optimization approach involves the creation of memristive crossbars that perform Matrix-Multiplications (MVMs) efficiently with low energy and area requirements.
arXiv Detail & Related papers (2021-05-09T19:26:25Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.