Performance Optimization using Multimodal Modeling and Heterogeneous GNN
- URL: http://arxiv.org/abs/2304.12568v2
- Date: Thu, 27 Apr 2023 15:34:39 GMT
- Title: Performance Optimization using Multimodal Modeling and Heterogeneous GNN
- Authors: Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Eduardo Cesar, Anna
Sikora, Ali Jannesari
- Abstract summary: We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks.
In this paper, we analyze IR-based programming models to make task-specific performance optimizations.
Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.
- Score: 1.304892050913381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Growing heterogeneity and configurability in HPC architectures has made
auto-tuning applications and runtime parameters on these systems very complex.
Users are presented with a multitude of options to configure parameters. In
addition to application specific solutions, a common approach is to use general
purpose search strategies, which often might not identify the best
configurations or their time to convergence is a significant barrier. There is,
thus, a need for a general purpose and efficient tuning approach that can be
easily scaled and adapted to various tuning tasks. We propose a technique for
tuning parallel code regions that is general enough to be adapted to multiple
tasks. In this paper, we analyze IR-based programming models to make
task-specific performance optimizations. To this end, we propose the Multimodal
Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning
based approach that adapts Heterogeneous Graph Neural Networks and Denoizing
Autoencoders for modeling IR-based code representations that serve as separate
modalities. This approach is used as part of our pipeline to model a syntax,
semantics, and structure-aware IR-based code representation for tuning parallel
code regions/kernels. We extensively experiment on OpenMP and OpenCL code
regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD
SDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply our
multimodal learning techniques to the tasks of i) optimizing the number of
threads, scheduling policy and chunk size in OpenMP loops and, ii) identifying
the best device for heterogeneous device mapping of OpenCL kernels. Our
experiments show that this multimodal learning based approach outperforms the
state-of-the-art in all experiments.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently.
We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.
We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling.
This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z) - ParaGraph: Weighted Graph Representation for Performance Optimization of
HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree.
We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region.
Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Machine Learning-Driven Adaptive OpenMP For Portable Performance on
Heterogeneous Systems [1.885335997132172]
Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters.
This paper proposes extensions to OpenMP for autonomous, machine learning-driven adaptation.
Our solution includes a set of novel language constructs, compiler transformations, and runtime support.
arXiv Detail & Related papers (2023-03-15T18:37:18Z) - Towards making the most of NLP-based device mapping optimization for
OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels.
We propose four different models that provide enhanced contextual information of source codes.
Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - A Framework for Interdomain and Multioutput Gaussian Processes [22.62911488724047]
We present a mathematical and software framework for scalable approximate inference in GPs.
Our framework, implemented in GPflow, provides a unified interface for many existing multioutput models.
arXiv Detail & Related papers (2020-03-02T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.