EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
Accelerators
- URL: http://arxiv.org/abs/2202.02310v1
- Date: Fri, 4 Feb 2022 18:48:36 GMT
- Title: EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
Accelerators
- Authors: Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos
Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu
- Abstract summary: Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs)
These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption.
We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
- Score: 12.223778147172107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dilated and transposed convolutions are widely used in modern convolutional
neural networks (CNNs). These kernels are used extensively during CNN training
and inference of applications such as image segmentation and high-resolution
image generation. Although these kernels have grown in popularity, they stress
current compute systems due to their high memory intensity, exascale compute
demands, and large energy consumption.
We find that commonly-used low-power CNN inference accelerators based on
spatial architectures are not optimized for both of these convolutional
kernels. Dilated and transposed convolutions introduce significant zero padding
when mapped to the underlying spatial architecture, significantly degrading
performance and energy efficiency. Existing approaches that address this issue
require significant design changes to the otherwise simple, efficient, and
well-adopted architectures used to compute direct convolutions.
To address this challenge, we propose EcoFlow, a new set of dataflows and
mapping algorithms for dilated and transposed convolutions. These algorithms
are tailored to execute efficiently on existing low-cost, small-scale spatial
architectures and requires minimal changes to the network-on-chip of existing
accelerators. EcoFlow eliminates zero padding through careful dataflow
orchestration and data mapping tailored to the spatial architecture. EcoFlow
enables flexible and high-performance transpose and dilated convolutions on
architectures that are otherwise optimized for CNN inference.
We evaluate the efficiency of EcoFlow on CNN training workloads and
Generative Adversarial Network (GAN) training workloads. Experiments in our new
cycle-accurate simulator show that EcoFlow 1) reduces end-to-end CNN training
time between 7-85%, and 2) improves end-to-end GAN training performance between
29-42%, compared to state-of-the-art CNN inference accelerators.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks.
We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal.
The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Multi-objective Evolutionary Approach for Efficient Kernel Size and
Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate.
These networks are computationally expensive involving billions of arithmetic operations and parameters.
This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural
Networks [5.417507302691321]
S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution.
Compared to the naive systolic array, S2Engine achieves about $3.2times$ and about $3.0times$ improvements on speed and energy efficiency, respectively.
arXiv Detail & Related papers (2021-06-15T06:08:37Z) - cuConv: A CUDA Implementation of Convolution for CNN Inference [0.0]
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs)
We propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations.
Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations.
arXiv Detail & Related papers (2021-03-30T10:33:53Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional
Neural Networks Training [34.657942518465575]
Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources.
In this paper, textitSparseTrain is proposed to accelerate CNN training by fully exploiting the sparsity.
We have built %a simple compiler to map CNNs onto textitSparseTrain, and a cycle-accurate architecture simulator to evaluate the performance and efficiency.
arXiv Detail & Related papers (2020-07-21T11:01:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.