Related papers: KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

URL: http://arxiv.org/abs/2511.18868v1
Date: Mon, 24 Nov 2025 08:11:50 GMT
Title: KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit
Authors: Dezhi Ran, Shuxiao Xie, Mingfang Ji, Ziyue Hua, Mengzhou Wu, Yuan Cao, Yuzhe Guo, Yu Hao, Linyi Li, Yitao Hu, Tao Xie,
Abstract summary: KernelBand is a novel framework that formulates kernel optimization as a hierarchical multi-armed bandit problem.<n>We show that KernelBand significantly outperforms state-of-the-art methods, achieving superior performance with fewer tokens while exhibiting consistent improvement without saturation as computational resources increase.
Score: 15.810081332925584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High quality kernels are critical for reducing training and inference costs of Large Language Models (LLMs), yet they traditionally require significant expertise in hardware architecture and software optimization. While recent advances in LLM-based code generation show promise for complex optimization, existing methods struggle with the vast optimization space due to insufficient hardware domain knowledge, failing to effectively balance exploration and exploitation. We present KernelBand, a novel framework that formulates kernel optimization as a hierarchical multi-armed bandit problem, enabling LLM agents to strategically navigate the optimization space by treating kernel selection and optimization strategy application as sequential decision-making processes. Our approach leverages hardware profiling information to identify promising optimization strategies and employs runtime behavior clustering to reduce exploration overhead across kernel candidates. Extensive experiments on TritonBench demonstrate that KernelBand significantly outperforms state-of-the-art methods, achieving superior performance with fewer tokens while exhibiting consistent improvement without saturation as computational resources increase.

Related papers

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model [57.440609834690385]
Existing approaches treat Large Language Models (LLMs) as rapid code generators within evolutionary loops.<n>We propose Search via Co-Evolving World Model and build K-Search based on this method.<n>We evaluate K-Search on diverse, complex kernels FlashInfer, including GQA, MLA, and MoE kernels.
arXiv Detail & Related papers (2026-02-22T11:06:22Z)
Towards Automated Kernel Generation in the Era of LLMs [17.69471168609145]
Kernel engineering is a time-consuming and non-scalable process.<n>Recent advances in large language models (LLMs) and agentic systems have opened new possibilities for automating kernel generation and optimization.<n>The field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation.
arXiv Detail & Related papers (2026-01-22T07:53:52Z)
QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation [41.53673797546332]
Micro Coding is a hierarchical framework inspired by the staged optimization strategy of human experts.<n>It decouples optimization strategy from implementation details, ensuring efficiency through high-level strategy and correctness through low-level implementation.<n>It achieves near 100% and 70% accuracy at Levels 1-2 and 3, over 50% than SOTA general-purpose and domain-finetuned LLMs, with up to 7.3x speedup over LLMs, and 2.2x over expert-optimized PyTorch Eager kernels.
arXiv Detail & Related papers (2025-11-25T09:17:47Z)
STARK: Strategic Team of Agents for Refining Kernels [23.717055490630596]
We introduce an agentic framework for GPU kernel optimization that explores the design space through multi-agent collaboration.<n>This framework mimics the workflow of expert engineers, enabling LLMs to reason about hardware trade-offs, incorporate profiling feedback, and refine kernels iteratively.<n>We evaluate our approach on KernelBench, a benchmark for LLM-based kernel optimization, and demonstrate substantial improvements over baseline agents.
arXiv Detail & Related papers (2025-10-19T20:41:46Z)
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization [0.0]
This paper introduces an automated methodology for iteratively refining accelerator kernels.<n>Our methodology employs LLMs in a multi-stage, evolutionary process.<n>We detail how this approach navigates the challenges of the AMD MI300 target architecture.
arXiv Detail & Related papers (2025-06-25T19:59:34Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation [46.5310645609264]
We propose a Meta-learning and Markov Chain Monte Carlo based SISR approach to learn kernel priors from organized randomness. A lightweight network is adopted as kernel generator, and is optimized via learning from the MCMC simulation on random Gaussian distributions. A meta-learning-based alternating optimization procedure is proposed to optimize the kernel generator and image restorer.
arXiv Detail & Related papers (2024-06-13T07:50:15Z)
LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
CompilerDream: Learning a Compiler World Model for General Code Optimization [58.87557583347996]
We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization.<n>It comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies.<n>It excels across diverse datasets, surpassing LLVM's built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.
arXiv Detail & Related papers (2024-04-24T09:20:33Z)
Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel [20.98449975854329]
This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper- parameters.<n>The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data.<n>To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework.
arXiv Detail & Related papers (2023-09-15T07:05:33Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.