GRACE: Globally-Seeded Representation-Aware Cluster-Specific Evolution for Compiler Auto-Tuning
- URL: http://arxiv.org/abs/2510.13176v1
- Date: Wed, 15 Oct 2025 06:01:19 GMT
- Title: GRACE: Globally-Seeded Representation-Aware Cluster-Specific Evolution for Compiler Auto-Tuning
- Authors: Haolin Pan, Chao Zha, Jinyuan Dong, Mingjie Xing, Yanjun Wu,
- Abstract summary: This paper introduces GRACE, a novel framework for compiler auto-tuning, demonstrated for LLVM IR instruction count optimization.<n> GRACE effectively curtails the search space by leveraging pass synergies and a weighted scoring method to generate initial high-quality candidate sequences and a pass pool.<n>It then employs contrastive learning, using pass sequence-based data augmentation, to create program embeddings that facilitate similarity-aware clustering.
- Score: 10.225578019039506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compiler pass selection and phase ordering present a significant challenge in achieving optimal program performance, particularly for objectives like code size reduction. Standard compiler heuristics offer general applicability but often yield suboptimal, program-specific results due to their one-size-fits-all nature. While iterative compilation can find tailored solutions, its prohibitive search cost limits practical use. Machine learning approaches promise faster inference but frequently struggle with generalization to unseen programs. This paper introduces GRACE, a novel framework for compiler auto-tuning, demonstrated for LLVM IR instruction count optimization. GRACE effectively curtails the search space by leveraging pass synergies and a weighted scoring method to generate initial high-quality candidate sequences and a pass pool. It then employs contrastive learning, using pass sequence-based data augmentation, to create program embeddings that facilitate similarity-aware clustering. Evolutionary search within these clusters yields a coreset of $k$ specialized pass sequences designed for robust generalization to unseen programs. At test time, GRACE efficiently selects the best coreset sequence and refines it using lightweight techniques. Experimental results on seven diverse datasets show that GRACE reduces LLVM IR instruction count by an average of 10.09% on LLVM 10.0.0 and 10.19% on LLVM 18.1.6 compared to opt -Oz, while incurring an average tuning time of less than 1s per program, demonstrating its state-of-the-art performance and practical effectiveness.
Related papers
- Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z) - LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models [23.6344001089164]
LOOPRAG is a retrieval-augmented generation framework designed to guide Large Language Models (LLMs) in performing effective loop optimization.<n>We introduce a parameter-driven method to harness loop properties, which trigger various loop transformations, and generate diverse yet legal example codes.<n>To enhance correct and efficient code generation, we introduce a feedback-based iterative mechanism that incorporates compilation, testing and performance results.
arXiv Detail & Related papers (2025-12-12T11:09:48Z) - Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction [35.89884852302035]
This paper proposes a novel quasi-dynamic framework for program representation.<n>The core insight is to model a program's optimization sensitivity.<n>To effectively encode this high-dimensional, continuous spectrum, we pioneer a compositional learning approach.
arXiv Detail & Related papers (2025-10-15T05:18:41Z) - Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning [31.639220758810747]
We introduce Compiler-R1, the first reinforcement learning (RL)-driven framework for compiler auto-tuning.<n>Our code and datasets are publicly available at https://github.com/Panhaolin2001/Compiler-R1.
arXiv Detail & Related papers (2025-05-30T00:26:10Z) - LLM Program Optimization via Retrieval Augmented Search [71.40092732256252]
We propose a blackbox adaptation method called Retrieval Augmented Search (RAS) that performs beam search over candidate optimizations.<n>We show that RAS performs 1.8$times$ better than prior state-of-the-art blackbox adaptation strategies.<n>We also propose a method called AEGIS for improving interpretability by decomposing training examples into "atomic edits"
arXiv Detail & Related papers (2025-01-31T06:34:47Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments [0.0]
This paper proposes InvAASTCluster, a novel approach for program clustering.<n>InvAASTCluster's program representation uses a combination of the program's semantics, through its invariants, and its structure.<n>Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based repair tools.
arXiv Detail & Related papers (2022-06-28T17:42:28Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Searching for More Efficient Dynamic Programs [61.79535031840558]
We describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a search procedure to improve this metric.
We show that in practice, automated search can find substantial improvements to the initial program.
arXiv Detail & Related papers (2021-09-14T20:52:55Z) - ProGraML: Graph-based Deep Learning for Program Optimization and
Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning.
ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches.
We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z) - AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep
Reinforcement Learning [17.584552398664737]
AutoPhase is a framework that takes a program and uses deep reinforcement learning to find a sequence of compilation passes that minimizes its execution time.
We show that AutoPhase improves circuit performance by 28% when compared to using the -O3 compiler flag.
Unlike existing state-of-the-art solutions, our deep reinforcement learning solution shows promising result in generalizing to real benchmarks.
arXiv Detail & Related papers (2020-03-02T05:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.