Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR
Applications
- URL: http://arxiv.org/abs/2206.06780v3
- Date: Tue, 28 Mar 2023 07:13:06 GMT
- Title: Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR
Applications
- Authors: Vivek Parmar, Syed Shakib Sarwar, Ziyun Li, Hsien-Hsin S. Lee, Barbara
De Salvo, Manan Suri
- Abstract summary: Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse.
In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration.
For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks.
The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated.
- Score: 5.529817156718514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-Power Edge-AI capabilities are essential for on-device extended reality
(XR) applications to support the vision of Metaverse. In this work, we
investigate two representative XR workloads: (i) Hand detection and (ii) Eye
segmentation, for hardware design space exploration. For both applications, we
train deep neural networks and analyze the impact of quantization and hardware
specific bottlenecks. Through simulations, we evaluate a CPU and two systolic
inference accelerator implementations. Next, we compare these hardware
solutions with advanced technology nodes. The impact of integrating
state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM)
into the XR-AI inference pipeline is evaluated. We found that significant
energy benefits (>=24%) can be achieved for hand detection (IPS=10) and eye
segmentation (IPS=0.1) by introducing non-volatile memory in the memory
hierarchy for designs at 7nm node while meeting minimum IPS (inference per
second). Moreover, we can realize substantial reduction in area (>=30%) owing
to the small form factor of MRAM compared to traditional SRAM.
Related papers
- Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Accelerating Neural Network Inference with Processing-in-DRAM: From the
Edge to the Cloud [9.927754948343326]
A neural network's performance (and energy efficiency) can be bound either by computation or memory resources.
The processing-in-memory (PIM) paradigm is a viable solution to accelerate memory-bound NNs.
We analyze three state-of-the-art PIM architectures for NN performance and energy efficiency.
arXiv Detail & Related papers (2022-09-19T11:46:05Z) - Efficient Deep Learning Using Non-Volatile Memory Technology [12.866655564742889]
We present DeepNVM++, a comprehensive framework to characterize, model, and analyze NVM-based caches in architectures for deep learning (DL) applications.
In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional cache.
DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in
arXiv Detail & Related papers (2022-06-27T19:27:57Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption.
Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process.
We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z) - Towards Memory-Efficient Neural Networks via Multi-Level in situ
Generation [10.563649948220371]
Deep neural networks (DNN) have shown superior performance in a variety of tasks.
As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices.
We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
arXiv Detail & Related papers (2021-08-25T18:50:24Z) - DeepNVM++: Cross-Layer Modeling and Optimization Framework of
Non-Volatile Memories for Deep Learning [11.228806840123084]
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional technologies.
In this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in deep learning (DL) applications.
arXiv Detail & Related papers (2020-12-08T16:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.