Related papers: Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

URL: http://arxiv.org/abs/2504.20198v1
Date: Mon, 28 Apr 2025 19:02:16 GMT
Title: Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems
Authors: Alireza Furutanpey, Carmen Walser, Philipp Raith, Pantelis A. Frangoudis, Schahram Dustdar,
Abstract summary: This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms.<n>Our systematic analysis reveals that graph compilers exhibit performance patterns highly dependent on both neural architecture and batch sizes.<n>We introduce novel metrics to quantify a compiler's ability to mitigate performance friction as batch size increases.
Score: 5.241450170761232
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and practical deployment scenarios. We demonstrate how vendor-specific optimizations can invalidate relative performance comparisons between architectural archetypes, with performance advantages sometimes completely reversing after compilation. Our systematic analysis reveals that graph compilers exhibit performance patterns highly dependent on both neural architecture and batch sizes. Through fine-grained block-level experimentation, we establish that vendor-specific compilers can leverage repeated patterns in simple architectures, yielding disproportionate throughput gains as model depth increases. We introduce novel metrics to quantify a compiler's ability to mitigate performance friction as batch size increases. Our methodology bridges the gap between academic research and practical deployment by incorporating compiler effects throughout the research process, providing actionable insights for practitioners navigating complex optimization landscapes across heterogeneous hardware environments.

Related papers

Compiler Optimization via LLM Reasoning for Efficient Model Serving [7.257845254223727]
We introduce a compilation framework (dubbed REASONING COMPILER) that formulates optimization as a sequential, context-aware decision process.<n>We achieve substantial speedups with markedly fewer samples than leading neural compilers.
arXiv Detail & Related papers (2025-06-02T07:02:46Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator. We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation [8.64220475114214]
oneDNN Graph Compiler employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high performance code generation. Experimental results demonstrate significant performance gains over existing tensor compiler and primitives library for performance-critical computation graphs.
arXiv Detail & Related papers (2023-01-03T19:52:17Z)
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization [6.4284258345779435]
AGO is a framework for graph optimization with arbitrary structures to boost the inference performance of deep models. We propose intensive operator fusion to stitch multiple complex operators together for better performance. We show that our system can improve the inference performance by up to 3.3x when compared with state-of-the-art deep compilers.
arXiv Detail & Related papers (2022-12-02T07:16:49Z)
Learning Interpretable Models Through Multi-Objective Neural Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability. We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z)
A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware. We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs. We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.