MLGOPerf: An ML Guided Inliner to Optimize Performance
- URL: http://arxiv.org/abs/2207.08389v2
- Date: Tue, 19 Jul 2022 15:07:17 GMT
- Title: MLGOPerf: An ML Guided Inliner to Optimize Performance
- Authors: Amir H. Ashouri, Mostafa Elhoushi, Yuzhe Hua, Xiang Wang, Muhammad
Asif Manzoor, Bryan Chan and Yaoqing Gao
- Abstract summary: This paper presents the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner.
It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent.
It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model.
- Score: 7.314201117946244
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: For the past 25 years, we have witnessed an extensive application of Machine
Learning to the Compiler space; the selection and the phase-ordering problem.
However, limited works have been upstreamed into the state-of-the-art
compilers, i.e., LLVM, to seamlessly integrate the former into the optimization
pipeline of a compiler to be readily deployed by the user. MLGO was among the
first of such projects and it only strives to reduce the code size of a binary
with an ML-based Inliner using Reinforcement Learning.
This paper presents MLGOPerf; the first end-to-end framework capable of
optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model
to generate rewards used for training a retargeted Reinforcement learning
agent, previously used as the primary model by MLGO. It does so by predicting
the post-inlining speedup of a function under analysis and it enables a fast
training framework for the primary model which otherwise wouldn't be practical.
The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with
respect to LLVM's optimization at O3 when trained for performance on SPEC
CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach
provides up to 26% increased opportunities to autotune code regions for our
benchmarks which can be translated into an additional 3.7% speedup value.
Related papers
- CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance [7.425372356516303]
Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models.
In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems.
We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
arXiv Detail & Related papers (2024-09-06T19:55:21Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes.
This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z) - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning.
Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques.
Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - ACPO: AI-Enabled Compiler-Driven Program Optimization [1.879008610342411]
ACPO is a framework to provide LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes.
We show that ACPO model for Loop Unroll is able to gain on average 4% compared to LLVM's O3 optimization when deployed on Polybench.
arXiv Detail & Related papers (2023-12-15T17:49:24Z) - GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation [6.525197444717069]
GEVO-ML is a tool for discovering optimization opportunities and tuning the performance of Machine Learning kernels.
We demonstrate GEVO-ML on two different ML workloads for both model training and prediction.
GEVO-ML finds significant improvements for these models, achieving 90.43% performance improvement when model accuracy is relaxed by 2%.
arXiv Detail & Related papers (2023-10-16T09:24:20Z) - Fine-Tuning Language Models with Just Forward Passes [92.04219196752007]
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a large amount of memory.
We propose a memory-efficient zerothorder (MeZO) to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference.
arXiv Detail & Related papers (2023-05-27T02:28:10Z) - MLGO: a Machine Learning Guided Compiler Optimizations Framework [0.0]
This work is the first full integration of machine learning in a complex compiler pass in a real-world setting.
We use two different ML algorithms to train the inlining-for-size model, and achieve up to 7% size reduction.
The same model generalizes well to a diversity of real-world targets, as well as to the same set of targets after months of active development.
arXiv Detail & Related papers (2021-01-13T00:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.