Related papers: A Deep Learning Based Cost Model for Automatic Code Optimization

A Deep Learning Based Cost Model for Automatic Code Optimization

URL: http://arxiv.org/abs/2104.04955v1
Date: Sun, 11 Apr 2021 08:32:42 GMT
Title: A Deep Learning Based Cost Model for Automatic Code Optimization
Authors: Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham Leghettas, Kamel Abdous, Taha Arbaoui, Karima Benatchba, Saman Amarasinghe
Abstract summary: We present a novel deep learning based cost model for automatic code optimization. It was integrated in the Tiramisu compiler to select the best code transformations. The proposed model has only 16% of mean absolute percentage error in predicting speedups on full programs.
Score: 0.24629531282150877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enabling compilers to automatically optimize code has been a longstanding goal for the compiler community. Efficiently solving this problem requires using precise cost models. These models predict whether applying a sequence of code transformations reduces the execution time of the program. Building an analytical cost model to do so is hard in modern x86 architectures due to the complexity of the microarchitecture. In this paper, we present a novel deep learning based cost model for automatic code optimization. This model was integrated in a search method and implemented in the Tiramisu compiler to select the best code transformations. The input of the proposed model is a set of simple features representing the unoptimized code and a sequence of code transformations. The model predicts the speedup expected when the code transformations are applied. Unlike previous models, the proposed one works on full programs and does not rely on any heavy feature engineering. The proposed model has only 16% of mean absolute percentage error in predicting speedups on full programs. The proposed model enables Tiramisu to automatically find code transformations that match or are better than state-of-the-art compilers without requiring the same level of heavy feature engineering required by those compilers.

Related papers

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
CompilerDream: Learning a Compiler World Model for General Code Optimization [58.87557583347996]
We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization. It comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies. It excels across diverse datasets, surpassing LLVM's built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.
arXiv Detail & Related papers (2024-04-24T09:20:33Z)
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers [1.7529897611426233]
We introduce LOOPer, the first polyhedral autoscheduler that uses a deep-learning based cost model. It supports the exploration of a large set of affine transformations, allowing the application of complex sequences of polyhedral transformations. It also supports the optimization of programs with multiple loop nests and with rectangular and non-rectangular iteration domains.
arXiv Detail & Related papers (2024-03-18T07:22:31Z)
Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval. We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z)
Large Language Models for Compiler Optimization [22.52765975286403]
We present a transformer model trained from scratch to optimize LLVM assembly for code size. We ask the model to predict the instruction counts before and after optimization, and the optimized code itself. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler.
arXiv Detail & Related papers (2023-09-11T22:11:46Z)
ML-driven Hardware Cost Model for MLIR [1.2987894327817158]
We develop a machine learning-based cost model for high-level MLIR. By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research. We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest.
arXiv Detail & Related papers (2023-02-14T11:32:47Z)
Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models. As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters. For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z)
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks [0.0]
This paper proposes MetaTune, a meta-learning based cost model that more quickly and accurately predicts the performance of optimized codes with pre-trained model parameters. The framework provides 8 to 13% better inference time on average for four CNN models with comparable or lower optimization time while outperforming transfer learning by 10% in cross-platform cases.
arXiv Detail & Related papers (2021-02-08T13:59:08Z)
A Learned Performance Model for Tensor Processing Units [5.733911161090224]
We demonstrate a method of learning performance models from a corpus of graph programs for Processing Unit (TPU) instances. We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks. It helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.
arXiv Detail & Related papers (2020-08-03T17:24:52Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.