Related papers: Applying Graph Explanation to Operator Fusion

Applying Graph Explanation to Operator Fusion

URL: http://arxiv.org/abs/2501.00636v1
Date: Tue, 31 Dec 2024 20:22:10 GMT
Title: Applying Graph Explanation to Operator Fusion
Authors: Keith G. Mills, Muhammad Fetrat Qharabagh, Weichen Qiu, Fred X. Han, Mohammad Salameh, Wei Lu, Shangling Jui, Di Niu,
Abstract summary: Fusion aims to lower inference costs by reducing data transactions between an accelerator's on-chip buffer and DRAM.<n>This is accomplished by grouped execution of multiple operations like convolution and activations together into single execution units - fusion groups.<n>Finding the optimal groups is a complex problem where the presence of invalid solutions hampers traditional search algorithms and demands robust approaches.
Score: 25.28963706415794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Layer fusion techniques are critical to improving the inference efficiency of deep neural networks (DNN) for deployment. Fusion aims to lower inference costs by reducing data transactions between an accelerator's on-chip buffer and DRAM. This is accomplished by grouped execution of multiple operations like convolution and activations together into single execution units - fusion groups. However, on-chip buffer capacity limits fusion group size and optimizing fusion on whole DNNs requires partitioning into multiple fusion groups. Finding the optimal groups is a complex problem where the presence of invalid solutions hampers traditional search algorithms and demands robust approaches. In this paper we incorporate Explainable AI, specifically Graph Explanation Techniques (GET), into layer fusion. Given an invalid fusion group, we identify the operations most responsible for group invalidity, then use this knowledge to recursively split the original fusion group via a greedy tree-based algorithm to minimize DRAM access. We pair our scheme with common algorithms and optimize DNNs on two types of layer fusion: Line-Buffer Depth First (LBDF) and Branch Requirement Reduction (BRR). Experiments demonstrate the efficacy of our scheme on several popular and classical convolutional neural networks like ResNets and MobileNets. Our scheme achieves over 20% DRAM Access reduction on EfficientNet-B3.

Related papers

Informed Greedy Algorithm for Scalable Bayesian Network Fusion via Minimum Cut Analysis [1.7086867242274812]
This paper presents the Greedy Min-Cut Bayesian Consensus (GMCBC) algorithm for the structural fusion of Bayesian Networks (BNs) The method is designed to preserve essential dependencies while controlling network complexity. It addresses the limitations of traditional fusion approaches, which often lead to excessively complex models.
arXiv Detail & Related papers (2025-04-01T06:47:33Z)
Optimal Brain Apoptosis [4.780105454349552]
This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. We introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments.
arXiv Detail & Related papers (2025-02-25T08:03:04Z)
GDSG: Graph Diffusion-based Solution Generator for Optimization Problems in MEC Networks [109.17835015018532]
We present a Graph Diffusion-based Solution Generation (GDSG) method.<n>This approach is designed to work with suboptimal datasets while converging to the optimal solution large probably.<n>We build GDSG as a multi-task diffusion model utilizing a Graph Neural Network (GNN) to acquire the distribution of high-quality solutions.
arXiv Detail & Related papers (2024-12-11T11:13:43Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference [19.5669231249754]
We present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. Remarkably, LinGCN achieves a 14.2x latency speedup relative to CryptoGCN, while preserving an inference accuracy of 75% and notably reducing multiplication depth.
arXiv Detail & Related papers (2023-09-25T17:56:54Z)
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion [28.03712082540713]
This paper proposes a novel and extensive loop fusion framework called DNNFusion. DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
arXiv Detail & Related papers (2021-08-30T16:11:38Z)
HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT) HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach. Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z)
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB) BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.