Hardware-Robust In-RRAM-Computing for Object Detection
- URL: http://arxiv.org/abs/2205.03996v1
- Date: Mon, 9 May 2022 01:46:24 GMT
- Title: Hardware-Robust In-RRAM-Computing for Object Detection
- Authors: Yu-Hsiang Chiang, Cheng En Ni, Yun Sung, Tuo-Hung Hou, Tian-Sheuan
Chang, and Shyh Jye Jou
- Abstract summary: In-RRAM computing suffered from large device variation and numerous nonideal effects in hardware.
This paper proposes a joint hardware and software optimization strategy to design a hardware-robust IRC macro for object detection.
The proposed approach has been successfully applied to a complex object detection task with only 3.85% mAP drop.
- Score: 0.15113576014047125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-memory computing is becoming a popular architecture for deep-learning
hardware accelerators recently due to its highly parallel computing, low power,
and low area cost. However, in-RRAM computing (IRC) suffered from large device
variation and numerous nonideal effects in hardware. Although previous
approaches including these effects in model training successfully improved
variation tolerance, they only considered part of the nonideal effects and
relatively simple classification tasks. This paper proposes a joint hardware
and software optimization strategy to design a hardware-robust IRC macro for
object detection. We lower the cell current by using a low word-line voltage to
enable a complete convolution calculation in one operation that minimizes the
impact of nonlinear addition. We also implement ternary weight mapping and
remove batch normalization for better tolerance against device variation, sense
amplifier variation, and IR drop problem. An extra bias is included to overcome
the limitation of the current sensing range. The proposed approach has been
successfully applied to a complex object detection task with only 3.85\% mAP
drop, whereas a naive design suffers catastrophic failure under these nonideal
effects.
Related papers
- SLaNC: Static LayerNorm Calibration [1.2016264781280588]
Quantization to lower precision formats naturally poses a number of challenges caused by the limited range of the available value representations.
In this article, we propose a computationally-efficient scaling technique that can be easily applied to Transformer models during inference.
Our method suggests a straightforward way of scaling the LayerNorm inputs based on the static weights of the immediately preceding linear layers.
arXiv Detail & Related papers (2024-10-14T14:32:55Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference [1.0919012968294923]
We introduce a novel algorithm-architecture co-design approach that accelerates transformers using head sparsity, block sparsity and approximation opportunities to reduce computations in attention and reduce memory access.
With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based row-balanced block pruning to prune unimportant blocks in the attention matrix at run time.
Also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time.
arXiv Detail & Related papers (2024-07-17T11:15:16Z) - Accelerating ViT Inference on FPGA through Static and Dynamic Pruning [2.8595179027282907]
Vision Transformers (ViTs) have achieved state-of-the-art accuracy on various computer vision tasks.
Weight and token pruning are two well-known methods for reducing complexity.
We propose an algorithm-hardware codesign for accelerating ViT on FPGA through simultaneous pruning.
arXiv Detail & Related papers (2024-03-21T00:09:04Z) - Task-Oriented Over-the-Air Computation for Multi-Device Edge AI [57.50247872182593]
6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task.
Task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system.
arXiv Detail & Related papers (2022-11-02T16:35:14Z) - PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party
Computation Based Private Inference [23.795457990555878]
Secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving deep learning (DL) computation.
MPCs often come at very high computation overhead, and potentially prohibit their popularity in large scale systems.
In this work, we develop a systematic framework, PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware acceleration.
arXiv Detail & Related papers (2022-09-20T02:47:37Z) - Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for
5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity.
One of the main challenges comes from the real-time implementation of these algorithms.
This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - AVAC: A Machine Learning based Adaptive RRAM Variability-Aware
Controller for Edge Devices [3.7346292069282643]
We propose an Adaptive RRAM Variability-Aware Controller, AVAC, which periodically updates Wait Buffer and batch sizes.
AVAC allows Edge devices to adapt to different applications and their stages, to improve performance and reduce energy consumption.
arXiv Detail & Related papers (2020-05-06T19:06:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.