K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
- URL: http://arxiv.org/abs/2602.19128v2
- Date: Thu, 26 Feb 2026 10:37:48 GMT
- Title: K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
- Authors: Shiyi Cao, Ziming Mao, Joseph E. Gonzalez, Ion Stoica,
- Abstract summary: Existing approaches treat Large Language Models (LLMs) as rapid code generators within evolutionary loops.<n>We propose Search via Co-Evolving World Model and build K-Search based on this method.<n>We evaluate K-Search on diverse, complex kernels FlashInfer, including GQA, MLA, and MoE kernels.
- Score: 57.440609834690385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimizing GPU kernels is critical for efficient modern machine learning systems yet remains challenging due to the complex interplay of design factors and rapid hardware evolution. Existing automated approaches typically treat Large Language Models (LLMs) merely as stochastic code generators within heuristic-guided evolutionary loops. These methods often struggle with complex kernels requiring coordinated, multi-step structural transformations, as they lack explicit planning capabilities and frequently discard promising strategies due to inefficient or incorrect intermediate implementations. To address this, we propose Search via Co-Evolving World Model and build K-Search based on this method. By replacing static search heuristics with a co-evolving world model, our framework leverages LLMs' prior domain knowledge to guide the search, actively exploring the optimization space. This approach explicitly decouples high-level algorithmic planning from low-level program instantiation, enabling the system to navigate non-monotonic optimization paths while remaining resilient to temporary implementation defects. We evaluate K-Search on diverse, complex kernels from FlashInfer, including GQA, MLA, and MoE kernels. Our results show that K-Search significantly outperforms state-of-the-art evolutionary search methods, achieving an average 2.10x improvement and up to a 14.3x gain on complex MoE kernels. On the GPUMode TriMul task, K-Search achieves state-of-the-art performance on H100, reaching 1030us and surpassing both prior evolution and human-designed solutions.
Related papers
- Towards Automated Kernel Generation in the Era of LLMs [17.69471168609145]
Kernel engineering is a time-consuming and non-scalable process.<n>Recent advances in large language models (LLMs) and agentic systems have opened new possibilities for automating kernel generation and optimization.<n>The field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation.
arXiv Detail & Related papers (2026-01-22T07:53:52Z) - Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree [17.08113692977552]
PhyloEvolve is a system that reframes GPU-oriented algorithm optimization as an In-Context Reinforcement Learning problem.<n>We introduce a phylogenetic tree representation that captures inheritance, divergence, and recombination among algorithm variants.<n>We evaluate PhyloEvolve on scientific computing workloads including PDE solvers, manifold learning, and spectral graph algorithms.
arXiv Detail & Related papers (2026-01-20T22:32:52Z) - PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution [64.15555230987222]
PACEvolve is a framework designed to robustly govern the agent's context and search dynamics.<n>We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement.
arXiv Detail & Related papers (2026-01-15T18:25:23Z) - Learning to Evolve with Convergence Guarantee via Neural Unrolling [37.99564850768798]
We introduce Learning to Evolve (L2E), a unified bilevel meta-optimization framework.<n>L2E reformulates evolutionary search as a Neural Unrolling process grounded in Krasnosel'skii-Mann (KM) fixed-point theory.<n>Experiments demonstrate the scalability of L2E in high-dimensional spaces and its robust zero-shot generalization across synthetic and real-world control tasks.
arXiv Detail & Related papers (2025-12-12T10:46:25Z) - Experience-Guided Reflective Co-Evolution of Prompts and Heuristics for Automatic Algorithm Design [124.54166764570972]
Combinatorial optimization problems are traditionally tackled with handcrafted algorithms.<n>Recent progress has highlighted the potential of automatics design powered by large language models.<n>We propose the Experience-Evolution Reflective Co-Guided of Prompt and Heuristics (EvoPH) for automatic algorithm design.
arXiv Detail & Related papers (2025-09-29T09:24:09Z) - CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design [11.639825726501659]
Large language models (LLMs) can autonomously discover high-performings at a fraction of the traditional cost.<n>We propose a hybrid framework that combines verbal and numerical guidance.<n>Our method outperforms state-of-the-art (SOTA) baselines across various optimization tasks.
arXiv Detail & Related papers (2025-05-18T07:48:47Z) - HKNAS: Classification of Hyperspectral Imagery Based on Hyper Kernel
Neural Architecture Search [104.45426861115972]
We propose to directly generate structural parameters by utilizing the specifically designed hyper kernels.
We obtain three kinds of networks to separately conduct pixel-level or image-level classifications with 1-D or 3-D convolutions.
A series of experiments on six public datasets demonstrate that the proposed methods achieve state-of-the-art results.
arXiv Detail & Related papers (2023-04-23T17:27:40Z) - Ranking Cost: Building An Efficient and Scalable Circuit Routing Planner
with Evolution-Based Optimization [49.207538634692916]
We propose a new algorithm for circuit routing, named Ranking Cost, to form an efficient and trainable router.
In our method, we introduce a new set of variables called cost maps, which can help the A* router to find out proper paths.
Our algorithm is trained in an end-to-end manner and does not use any artificial data or human demonstration.
arXiv Detail & Related papers (2021-10-08T07:22:45Z) - AlphaGAN: Fully Differentiable Architecture Search for Generative
Adversarial Networks [15.740179244963116]
Generative Adversarial Networks (GANs) are formulated as minimax game problems, whereby generators attempt to approach real data distributions by virtue of adversarial learning against discriminators.
In this work, we aim to boost model learning from the perspective of network architectures, by incorporating recent progress on automated architecture search into GANs.
We propose a fully differentiable search framework for generative adversarial networks, dubbed alphaGAN.
arXiv Detail & Related papers (2020-06-16T13:27:30Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.