Related papers: AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

URL: http://arxiv.org/abs/2212.01005v1
Date: Fri, 2 Dec 2022 07:16:49 GMT
Title: AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
Authors: Zhiying Xu, Hongding Peng, Wei Wang
Abstract summary: AGO is a framework for graph optimization with arbitrary structures to boost the inference performance of deep models. We propose intensive operator fusion to stitch multiple complex operators together for better performance. We show that our system can improve the inference performance by up to 3.3x when compared with state-of-the-art deep compilers.
Score: 6.4284258345779435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional deep learning compilers rely on heuristics for subgraph generation, which impose extra constraints on graph optimization, e.g., each subgraph can only contain at most one complex operator. In this paper, we propose AGO, a framework for graph optimization with arbitrary structures to boost the inference performance of deep models by removing such constraints. To create new optimization opportunities for complicated subgraphs, we propose intensive operator fusion, which can effectively stitch multiple complex operators together for better performance. Further, we design a graph partitioning scheme that allows an arbitrary structure for each subgraph while guaranteeing the acyclic property among all generated subgraphs. Additionally, to enable efficient performance tuning on complicated subgraphs, we devise a novel divide-and-conquer tuning mechanism to orchestrate different system components. Through extensive experiments on various neural networks and mobile devices, we show that our system can improve the inference performance by up to 3.3x when compared with state-of-the-art deep compilers.

Related papers

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems [5.241450170761232]
This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms. Our systematic analysis reveals that graph compilers exhibit performance patterns highly dependent on both neural architecture and batch sizes. We introduce novel metrics to quantify a compiler's ability to mitigate performance friction as batch size increases.
arXiv Detail & Related papers (2025-04-28T19:02:16Z)
RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline. RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components. Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z)
GraphCroc: Cross-Correlation Autoencoder for Graph Structural Reconstruction [6.817416560637197]
Graph autoencoders (GAEs) reconstruct graph structures from node embeddings. We introduce a cross-correlation mechanism that significantly enhances the GAE representational capabilities. We also propose GraphCroc, a new GAE that supports flexible encoder architectures tailored for various downstream tasks.
arXiv Detail & Related papers (2024-10-04T12:59:45Z)
Bayesian Optimization of Functions over Node Subsets in Graphs [14.670181702535825]
We propose a novel framework for optimization on graphs. We map each $k$-node in the original graph to a node in a new graph. Experiments under both synthetic and real-world setups demonstrate the effectiveness of the proposed BO framework.
arXiv Detail & Related papers (2024-05-24T00:24:55Z)
A structure-aware framework for learning device placements on computation graphs [15.282882425920064]
We propose a novel framework for the task of device placement, relying on smaller graphs extracted from the OpenVINO toolkit. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models.
arXiv Detail & Related papers (2024-05-23T05:29:29Z)
From Hypergraph Energy Functions to Hypergraph Neural Networks [94.88564151540459]
We present an expressive family of parameterized, hypergraph-regularized energy functions. We then demonstrate how minimizers of these energies effectively serve as node embeddings. We draw parallels between the proposed bilevel hypergraph optimization, and existing GNN architectures in common use.
arXiv Detail & Related papers (2023-06-16T04:40:59Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures [24.841128441671234]
RGNNs are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs. We propose Hector, a novel two-level intermediate representation and its code generator framework, to capture the key properties of RGNN models. Hector achieves up to 9.9x speed-up in inference and 43.7x speed-up in training compared with the state-of-the-art public systems.
arXiv Detail & Related papers (2023-01-16T06:53:18Z)
ALT: Breaking the Wall between Graph and Operator Level Optimizations for Deep Learning Compilation [38.8918502461244]
ALT is a compiler that performs joint graph- and operator-level optimizations for deep models. JOG significantly outperforms state-of-the-art compilers (e.g., Ansor) in terms of both single operator performance and end-to-end inference performance.
arXiv Detail & Related papers (2022-10-22T11:09:36Z)
Graph Contrastive Learning Automated [94.41860307845812]
Graph contrastive learning (GraphCL) has emerged with promising representation learning performance. The effectiveness of GraphCL hinges on ad-hoc data augmentations, which have to be manually picked per dataset. This paper proposes a unified bi-level optimization framework to automatically, adaptively and dynamically select data augmentations when performing GraphCL on specific graph data.
arXiv Detail & Related papers (2021-06-10T16:35:27Z)
A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE. AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution. Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z)
Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results [58.277290855841976]
We study tradeoffs of computational cost and expressive power of Graph Neural Networks (GNNs) We show that a new model can count subgraphs of size $k$, and thereby overcomes a known limitation of low-order GNNs. In several cases, the proposed algorithm can greatly reduce computational complexity compared to the existing higher-order $k$-GNNs.
arXiv Detail & Related papers (2020-12-06T03:42:54Z)
Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks. We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.