Related papers: DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

URL: http://arxiv.org/abs/2108.13342v1
Date: Mon, 30 Aug 2021 16:11:38 GMT
Title: DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion
Authors: Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren
Abstract summary: This paper proposes a novel and extensive loop fusion framework called DNNFusion. DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
Score: 28.03712082540713
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

Related papers

Fusion DeepONet: A Data-Efficient Neural Operator for Geometry-Dependent Hypersonic Flows on Arbitrary Grids [5.425982743247563]
We evaluate advanced neural operator models for learning geometry-dependent hypersonic flow fields with limited data. We develop a novel framework, called Fusion DeepONet, which leverages neural field concepts and generalizes effectively across varying geometries. Despite the scarcity of training data, Fusion DeepONet achieves performance comparable to parameter-conditioned U-Net on uniform grids.
arXiv Detail & Related papers (2025-01-03T18:15:23Z)
Applying Graph Explanation to Operator Fusion [25.28963706415794]
Fusion aims to lower inference costs by reducing data transactions between an accelerator's on-chip buffer and DRAM. This is accomplished by grouped execution of multiple operations like convolution and activations together into single execution units - fusion groups. Finding the optimal groups is a complex problem where the presence of invalid solutions hampers traditional search algorithms and demands robust approaches.
arXiv Detail & Related papers (2024-12-31T20:22:10Z)
Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction Models [27.477136474888564]
We introduce OptFusion, a method that automates the learning of fusion, encompassing both the connection learning and the operation selection. Our experiments are conducted over three large-scale datasets.
arXiv Detail & Related papers (2024-11-24T06:21:59Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
MGDCF: Distance Learning via Markov Graph Diffusion for Neural Collaborative Filtering [96.65234340724237]
We show the equivalence between some state-of-the-art GNN-based CF models and a traditional 1-layer NRL model based on context encoding. We present Markov Graph Diffusion Collaborative Filtering (MGDCF) to generalize some state-of-the-art GNN-based CF models.
arXiv Detail & Related papers (2022-04-05T17:24:32Z)
Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey [4.856755747052137]
Deep Neural Networks (DNNs) are very popular because of their high performance in various cognitive tasks in Machine Learning (ML) Recent advancements in DNNs have brought beyond human accuracy in many tasks, but at the cost of high computational complexity. This article provides a comprehensive survey and analysis of hardware approximation techniques for DNN accelerators.
arXiv Detail & Related papers (2022-03-16T16:33:13Z)
Designing the Topology of Graph Neural Networks: A Novel Feature Fusion Perspective [12.363386808994079]
We learn to design the topology of GNNs in a novel feature fusion perspective which is dubbed F$2$GNN. We develop a neural architecture search method on top of the unified framework which contains a set of selection and fusion operations. The performance gains on eight real-world datasets demonstrate the effectiveness of F$2$GNN.
arXiv Detail & Related papers (2021-12-29T13:06:12Z)
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF) The proposed SIDNN is compatible with a broad range of OPF schemes. It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z)
Efficient Algorithms for Device Placement of DNN Graph Operators [12.871398348743591]
Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. Recent work has shown that significant gains can be obtained with model parallelism, i.e., partitioning a neural network's computational graph onto multiple devices. In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings.
arXiv Detail & Related papers (2020-06-29T22:45:01Z)
Fusion Recurrent Neural Network [88.5550074808201]
We propose a novel, succinct and promising RNN - Fusion Recurrent Neural Network (Fusion RNN) Fusion RNN is composed of Fusion module and Transport module every time step. In order to evaluate Fusion RNN's sequence feature extraction capability, we choose a representative data mining task for sequence data, estimated time of arrival (ETA) and present a novel model based on Fusion RNN.
arXiv Detail & Related papers (2020-06-07T07:39:49Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.