Evolutionary Bin Packing for Memory-Efficient Dataflow Inference
Acceleration on FPGA
- URL: http://arxiv.org/abs/2003.12449v1
- Date: Tue, 24 Mar 2020 09:55:08 GMT
- Title: Evolutionary Bin Packing for Memory-Efficient Dataflow Inference
Acceleration on FPGA
- Authors: Mairin Kroes, Lucian Petrica, Sorin Cotofana, Michaela Blott
- Abstract summary: Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency.
However, the shapes complex of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM)
We present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM.
- Score: 2.3395728784538767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural network (CNN) dataflow inference accelerators
implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated
increased energy efficiency and lower latency compared to CNN execution on CPUs
or GPUs. However, the complex shapes of CNN parameter memories do not typically
map well to FPGA on-chip memories (OCM), which results in poor OCM utilization
and ultimately limits the size and types of CNNs which can be effectively
accelerated on FPGAs. In this work, we present a design methodology that
improves the mapping efficiency of CNN parameters to FPGA OCM. We frame the
mapping as a bin packing problem and determine that traditional bin packing
algorithms are not well suited to solve the problem within FPGA- and
CNN-specific constraints. We hybridize genetic algorithms and simulated
annealing with traditional bin packing heuristics to create flexible mappers
capable of grouping parameter memories such that each group optimally fits FPGA
on-chip memories. We evaluate these algorithms on a variety of FPGA inference
accelerators. Our hybrid mappers converge to optimal solutions in a matter of
seconds for all CNN use-cases, achieve an increase of up to 65% in OCM
utilization efficiency for deep CNNs, and are up to 200$\times$ faster than
current state-of-the-art simulated annealing approaches.
Related papers
- H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory [1.0056445773367833]
Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access.
This work augments a state-of-the-art dataflow accelerator to leverage both High-Bandwidth Memory (HBM) and on-chip storage.
Compared to the best prior work we obtain speed-ups of at least 19.4x, 5.1x and 10.5x on ResNet-18, ResNet-50 and VGG-16 respectively.
arXiv Detail & Related papers (2024-08-17T14:25:32Z) - Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights
Generation [13.681095158525514]
unzipFPGA is a novel CNN inference system that counteracts the limitations of existing CNN engines.
We introduce a weights generator module that enables the on-chip on-the-fly generation of weights.
We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair.
arXiv Detail & Related papers (2023-07-25T11:19:21Z) - SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast,
Energy-Efficient Inference of Integer-Quantized CNNs [0.0]
A CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations.
Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs.
Existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size.
arXiv Detail & Related papers (2023-02-14T13:35:15Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights
Generation [17.142094527372993]
Singlevolution engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs)
In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-con stage to decompress the weights at run time.
To minimise the negative impact of limited bandwidth on memory-bound layers, we present a novel hardware component that enables the on-the-fly generation of weights.
arXiv Detail & Related papers (2021-03-09T18:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.