Related papers: GroupTuner: Efficient Group-Aware Compiler Auto-Tuning

GroupTuner: Efficient Group-Aware Compiler Auto-Tuning

URL: http://arxiv.org/abs/2505.08598v2
Date: Tue, 24 Jun 2025 01:24:52 GMT
Title: GroupTuner: Efficient Group-Aware Compiler Auto-Tuning
Authors: Bingyu Gao, Mengyu Yao, Ziming Wang, Dong Liu, Ding Li, Xiangqun Chen, Yao Guo,
Abstract summary: GroupTuner is a group-aware auto-tuning technique that applies localized mutation to coherent option groups based on historically best-performing combinations.<n>Experiments demonstrate that GroupTuner can efficiently discover competitive option combinations, achieving an average performance improvement of 12.39% over -O3.
Score: 14.545919877837436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern compilers typically provide hundreds of options to optimize program performance, but users often cannot fully leverage them due to the huge number of options. While standard optimization combinations (e.g., -O3) provide reasonable defaults, they often fail to deliver near-peak performance across diverse programs and architectures. To address this challenge, compiler auto-tuning techniques have emerged to automate the discovery of improved option combinations. Existing techniques typically focus on identifying critical options and prioritizing them during the search to improve efficiency. However, due to limited tuning iterations, the resulting data is often sparse and noisy, making it highly challenging to accurately identify critical options. As a result, these algorithms are prone to being trapped in local optima. To address this limitation, we propose GroupTuner, a group-aware auto-tuning technique that directly applies localized mutation to coherent option groups based on historically best-performing combinations, thus avoiding explicitly identifying critical options. By forgoing the need to know precisely which options are most important, GroupTuner maximizes the use of existing performance data, ensuring more targeted exploration. Extensive experiments demonstrate that GroupTuner can efficiently discover competitive option combinations, achieving an average performance improvement of 12.39% over -O3 while requiring only 77.21% of the time compared to the random search algorithm, significantly outperforming state-of-the-art methods.

Related papers

Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO) ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z)
Controllable Prompt Tuning For Balancing Group Distributional Robustness [53.336515056479705]
We introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. We propose Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data.
arXiv Detail & Related papers (2024-03-05T06:23:55Z)
Towards General and Efficient Online Tuning for Spark [55.30868031221838]
We present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent.
arXiv Detail & Related papers (2023-09-05T02:16:45Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Non-Elitist Selection Can Improve the Performance of Irace [0.8258451067861933]
We study two alternative selection methods for tuning ant colony optimization algorithms for traveling salesperson problems and the quadratic assignment problem. The experimental results show improvement on the tested benchmarks compared to the default selection of irace. In addition, the obtained results indicate that non-elitist can obtain diverse algorithm configurations, which encourages us to explore a wider range of solutions to understand the behavior of algorithms.
arXiv Detail & Related papers (2022-03-17T10:34:30Z)
Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance? First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent. In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z)
Bayesian Optimization for auto-tuning GPU kernels [0.0]
Finding optimal parameter configurations for GPU kernels is a non-trivial exercise for large search spaces, even when automated. We introduce a novel contextual exploration factor as well as new acquisition functions with improved scalability, combined with an informed function selection mechanism.
arXiv Detail & Related papers (2021-11-26T11:26:26Z)
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers [29.624308090226375]
In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed anecdotes. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learnings. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods.
arXiv Detail & Related papers (2020-07-03T08:19:36Z)
Analysis of the Performance of Algorithm Configurators for Search Heuristics with Global Mutation Operators [0.0]
ParamRLS can efficiently identify the optimal neighbourhood size to be used by local search. We show that the simple ParamRLS-F can identify the optimal mutation rates even when using cutoff times that are considerably smaller than the expected optimisation time of the best parameter value for both problem classes.
arXiv Detail & Related papers (2020-04-09T12:42:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.