Related papers: Towards Automated Kernel Generation in the Era of LLMs

Towards Automated Kernel Generation in the Era of LLMs

URL: http://arxiv.org/abs/2601.15727v2
Date: Mon, 26 Jan 2026 08:47:52 GMT
Title: Towards Automated Kernel Generation in the Era of LLMs
Authors: Yang Yu, Peiyu Zang, Chi Hsu Tsai, Haiming Wu, Yixin Shen, Jialing Zhang, Haoyu Wang, Zhiyou Xiao, Jingze Shi, Yuyu Luo, Wentao Zhang, Chunlei Men, Guang Liu, Yonghua Lin,
Abstract summary: Kernel engineering is a time-consuming and non-scalable process.<n>Recent advances in large language models (LLMs) and agentic systems have opened new possibilities for automating kernel generation and optimization.<n>The field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation.
Score: 17.69471168609145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of modern AI systems is fundamentally constrained by the quality of their underlying kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programming models, making kernel engineering a critical but notoriously time-consuming and non-scalable process. Recent advances in large language models (LLMs) and LLM-based agents have opened new possibilities for automating kernel generation and optimization. LLMs are well-suited to compress expert-level kernel knowledge that is difficult to formalize, while agentic systems further enable scalable optimization by casting kernel development as an iterative, feedback-driven loop. Rapid progress has been made in this area. However, the field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation. This survey addresses this gap by providing a structured overview of existing approaches, spanning LLM-based approaches and agentic optimization workflows, and systematically compiling the datasets and benchmarks that underpin learning and evaluation in this domain. Moreover, key open challenges and future research directions are further outlined, aiming to establish a comprehensive reference for the next generation of automated kernel optimization. To keep track of this field, we maintain an open-source GitHub repository at https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation.

Related papers

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model [57.440609834690385]
Existing approaches treat Large Language Models (LLMs) as rapid code generators within evolutionary loops.<n>We propose Search via Co-Evolving World Model and build K-Search based on this method.<n>We evaluate K-Search on diverse, complex kernels FlashInfer, including GQA, MLA, and MoE kernels.
arXiv Detail & Related papers (2026-02-22T11:06:22Z)
AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units [39.846358001824996]
We propose Ascend KernelGen, a generation-evaluation integrated framework for NPU kernel development.<n>We introduce Ascend-CoT, a high-quality dataset incorporating chain-of-thought reasoning derived from real-world kernel implementations.<n>We also design NPU KernelBench, a comprehensive benchmark for assessing compilation, correctness, and performance across varying complexity levels.
arXiv Detail & Related papers (2026-01-12T03:12:58Z)
KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit [15.810081332925584]
KernelBand is a novel framework that formulates kernel optimization as a hierarchical multi-armed bandit problem.<n>We show that KernelBand significantly outperforms state-of-the-art methods, achieving superior performance with fewer tokens while exhibiting consistent improvement without saturation as computational resources increase.
arXiv Detail & Related papers (2025-11-24T08:11:50Z)
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code [52.66657751895655]
Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation.<n>This paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation.<n>We propose a self-evolving prompt optimization method that enables LLMs to evolve their internal prompt strategies.
arXiv Detail & Related papers (2025-11-03T03:20:26Z)
STARK: Strategic Team of Agents for Refining Kernels [23.717055490630596]
We introduce an agentic framework for GPU kernel optimization that explores the design space through multi-agent collaboration.<n>This framework mimics the workflow of expert engineers, enabling LLMs to reason about hardware trade-offs, incorporate profiling feedback, and refine kernels iteratively.<n>We evaluate our approach on KernelBench, a benchmark for LLM-based kernel optimization, and demonstrate substantial improvements over baseline agents.
arXiv Detail & Related papers (2025-10-19T20:41:46Z)
Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models [26.985412258634256]
Large language models (LLMs) have shown promise for automated kernel optimization, demonstrating success in domains with comprehensive technical documents and mature scarcitys.<n>We present Evolution of Kernels (EoK), a novel LLM-based evolutionary program search framework that automates kernel design for domains with limited reference material.<n>EoK achieves a median 1.27x speedup, surpassing human experts on all 80 evaluated kernel design tasks.
arXiv Detail & Related papers (2025-09-14T08:11:06Z)
Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey [69.45421620616486]
This work presents the first structured taxonomy and analysis of discrete tokenization methods designed for large language models (LLMs)<n>We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines.<n>We identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints.
arXiv Detail & Related papers (2025-07-21T10:52:14Z)
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization [0.0]
This paper introduces an automated methodology for iteratively refining accelerator kernels.<n>Our methodology employs LLMs in a multi-stage, evolutionary process.<n>We detail how this approach navigates the challenges of the AMD MI300 target architecture.
arXiv Detail & Related papers (2025-06-25T19:59:34Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels. We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.