Related papers: RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing

RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing

URL: http://arxiv.org/abs/2407.02622v1
Date: Tue, 2 Jul 2024 19:25:05 GMT
Title: RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing
Authors: Won Hyeok Kim, Hyeong Jin Kim, Tae Hee Han,
Abstract summary: This paper introduces the RISC-V R-extension, a novel approach to enhancing deep neural network (DNN) process efficiency on edge devices. The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency.
Score: 0.8192907805418583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternative. This paper introduces the RISC-V R-extension, a novel approach to enhancing DNN process efficiency on edge devices. The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency. Furthermore, this extension includes new custom instructions to support these architectural improvements. Through comprehensive analysis, this study demonstrates the boost of R-extension in edge device processing, setting the stage for more responsive and intelligent edge applications.

Related papers

On Accelerating Edge AI: Optimizing Resource-Constrained Environments [1.7355861031903428]
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. We present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints.
arXiv Detail & Related papers (2025-01-25T01:37:03Z)
USEFUSE: Utile Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z)
Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered by Reconfigurable Intelligent Surfaces [30.1512069754603]
We propose a novel algorithm for energy-efficient, low-latency, accurate inference at the wireless edge. We consider a scenario where new data are continuously generated/collected by a set of devices and are handled through a dynamic queueing system.
arXiv Detail & Related papers (2023-05-18T12:46:42Z)
ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism. In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution. We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z)
IMDeception: Grouped Information Distilling Super-Resolution Network [7.6146285961466]
Single-Image-Super-Resolution (SISR) is a classical computer vision problem that has benefited from the recent advancements in deep learning methods. In this work, we propose the Global Progressive Refinement Module (GPRM) as a less parameter-demanding alternative to the IIC module for feature aggregation. We also propose Grouped Information Distilling Blocks (GIDB) to further decrease the number of parameters and floating point operations persecond (FLOPS) Experiments reveal that the proposed network performs on par with state-of-the-art models despite having a limited number of parameters and FLOPS
arXiv Detail & Related papers (2022-04-25T06:43:45Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network [40.817290717344534]
Reconfigurable surface (RIS) is an emerging technology for wireless communication systems. We propose to apply a fully convolutional network (WSNFC) to solve this problem. We design a set of channel features that includes both cascaded channels via the RIS and the direct channel.
arXiv Detail & Related papers (2022-01-08T14:16:00Z)
From DNNs to GANs: Review of efficient hardware architectures for deep learning [0.0]
Neural network and deep learning has been started to impact the present research paradigm. DSP processors are incapable of performing neural network, activation function, convolutional neural network and generative adversarial network operations. Different algorithms have been adapted to design a DSP processor compatible for fast performance in neural network, activation function, convolutional neural network and generative adversarial network.
arXiv Detail & Related papers (2021-06-06T13:23:06Z)
Phase Configuration Learning in Wireless Networks with Multiple Reconfigurable Intelligent Surfaces [50.622375361505824]
Reconfigurable Intelligent Surfaces (RISs) are highly scalable technology capable of offering dynamic control of electro-magnetic wave propagation. One of the major challenges with RIS-empowered wireless communications is the low-overhead dynamic configuration of multiple RISs. We devise low-complexity supervised learning approaches for the RISs' phase configurations.
arXiv Detail & Related papers (2020-10-09T05:35:27Z)
RIS Enhanced Massive Non-orthogonal Multiple Access Networks: Deployment and Passive Beamforming Design [116.88396201197533]
A novel framework is proposed for the deployment and passive beamforming design of a reconfigurable intelligent surface (RIS) The problem of joint deployment, phase shift design, as well as power allocation is formulated for maximizing the energy efficiency. A novel long short-term memory (LSTM) based echo state network (ESN) algorithm is proposed to predict users' tele-traffic demand by leveraging a real dataset. A decaying double deep Q-network (D3QN) based position-acquisition and phase-control algorithm is proposed to solve the joint problem of deployment and design of the RIS.
arXiv Detail & Related papers (2020-01-28T14:37:38Z)
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network [17.928105470385614]
We propose an intelligent tiled-based mechanism for increasing the adaptiveness of RNN, in order to efficiently handle the data dependencies. Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN models and resource budgets.
arXiv Detail & Related papers (2019-11-04T14:51:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.