Related papers: PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy

PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy

URL: http://arxiv.org/abs/2408.02412v1
Date: Mon, 5 Aug 2024 12:11:09 GMT
Title: PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy
Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique,
Abstract summary: Convolutional Neural Networks (CNNs) have emerged as a state-of-the-art solution for solving machine learning tasks. CNN accelerators face performance- and energy-efficiency challenges due to high off-chip memory (DRAM) access latency and energy. We present PENDRAM, a novel design space exploration methodology that enables high-performance and energy-efficient CNN acceleration.
Score: 6.85785397160228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks. To improve the performance and energy efficiency of CNN inference, the employment of specialized hardware accelerators is prevalent. However, CNN accelerators still face performance- and energy-efficiency challenges due to high off-chip memory (DRAM) access latency and energy, which are especially crucial for latency- and energy-constrained embedded applications. Moreover, different DRAM architectures have different profiles of access latency and energy, thus making it challenging to optimize them for high performance and energy-efficient CNN accelerators. To address this, we present PENDRAM, a novel design space exploration methodology that enables high-performance and energy-efficient CNN acceleration through a generalized DRAM data mapping policy. Specifically, it explores the impact of different DRAM data mapping policies and DRAM architectures across different CNN partitioning and scheduling schemes on the DRAM access latency and energy, then identifies the pareto-optimal design choices. The experimental results show that our DRAM data mapping policy improves the energy-delay-product of DRAM accesses in the CNN accelerator over other mapping policies by up to 96%. In this manner, our PENDRAM methodology offers high-performance and energy-efficient CNN acceleration under any given DRAM architectures for diverse embedded AI applications.

Related papers

Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices. Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z)
EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural Network Inference considering Approximate DRAMs for Embedded Systems [15.115813664357436]
Spiking Neural Networks (SNNs) have shown capabilities of achieving high accuracy under unsupervised settings and low operational power/energy. We propose EnforceSNN, a novel design framework that provides a solution for resilient and energy-efficient SNN inference using reduced-voltage DRAM.
arXiv Detail & Related papers (2023-04-08T15:15:11Z)
Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient. Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z)
Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network [75.1236305913734]
We investigate the dynamics-aware adversarial attack problem in deep neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
arXiv Detail & Related papers (2021-12-17T10:53:35Z)
ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing [0.5257115841810257]
ATRIA is a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for high-speed inference of CNNs. We show that ATRIA exhibits only 3.5% drop in CNN inference accuracy and still improvements of up to 3.2x in frames-per-second (FPS) and up to 10x in efficiency.
arXiv Detail & Related papers (2021-05-26T18:36:01Z)
SparkXD: A Framework for Resilient and Energy-Efficient Spiking Neural Network Inference using Approximate DRAM [15.115813664357436]
Spiking Neural Networks (SNNs) have the potential for achieving low energy consumption due to their biologically sparse computation. Several studies have shown that the off-chip memory (DRAM) accesses are the most energy-consuming operations in SNN processing. We propose SparkXD, a novel framework that provides a comprehensive conjoint solution for resilient and energy-efficient SNN inference.
arXiv Detail & Related papers (2021-02-28T08:12:26Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
DRMap: A Generic DRAM Data Mapping Policy for Energy-Efficient Processing of Convolutional Neural Networks [15.115813664357436]
We study the latency and energy of different mapping policies on different DRAM architectures. The results show that the energy-efficient DRAM accesses can be achieved by a mapping policy that orderly prioritizes to maximize the row buffer hits, bank- and subarray-level parallelism.
arXiv Detail & Related papers (2020-04-21T23:26:23Z)
Data-Driven Neuromorphic DRAM-based CNN and RNN Accelerators [13.47462920292399]
The energy consumed by running large deep neural networks (DNNs) on hardware accelerators is dominated by the need for lots of fast memory to store both states and weights. Although DRAM is high-cost and low-cost memory (costing 20X less than DRAM), its long random access latency is bad for the unpredictable access patterns in spiking neural networks (SNNs) This paper reports on our developments over the last 5 years of convolutional and recurrent deep neural network hardware accelerators that exploit either spatial or temporal sparsity similar to SNNs but achieve SOA throughput, power efficiency and latency even with the use
arXiv Detail & Related papers (2020-03-29T11:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.