Related papers: EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

URL: http://arxiv.org/abs/2202.02310v1
Date: Fri, 4 Feb 2022 18:48:36 GMT
Title: EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Authors: Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu
Abstract summary: Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs) These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption. We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
Score: 12.223778147172107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption. We find that commonly-used low-power CNN inference accelerators based on spatial architectures are not optimized for both of these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to the network-on-chip of existing accelerators. EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. EcoFlow enables flexible and high-performance transpose and dilated convolutions on architectures that are otherwise optimized for CNN inference. We evaluate the efficiency of EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) training workloads. Experiments in our new cycle-accurate simulator show that EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN inference accelerators.

Related papers

DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z)
Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z)
A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks. We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal. The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Multi-objective Evolutionary Approach for Efficient Kernel Size and Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate. These networks are computationally expensive involving billions of arithmetic operations and parameters. This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks [5.417507302691321]
S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2times$ and about $3.0times$ improvements on speed and energy efficiency, respectively.
arXiv Detail & Related papers (2021-06-15T06:08:37Z)
cuConv: A CUDA Implementation of Convolution for CNN Inference [0.0]
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs) We propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations.
arXiv Detail & Related papers (2021-03-30T10:33:53Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training [34.657942518465575]
Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources. In this paper, textitSparseTrain is proposed to accelerate CNN training by fully exploiting the sparsity. We have built %a simple compiler to map CNNs onto textitSparseTrain, and a cycle-accurate architecture simulator to evaluate the performance and efficiency.
arXiv Detail & Related papers (2020-07-21T11:01:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.