ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
- URL: http://arxiv.org/abs/2510.07356v1
- Date: Wed, 08 Oct 2025 15:41:15 GMT
- Title: ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
- Authors: Lingcheng Kong, Jiateng Wei, Hanzhang Shen, Huan Wang,
- Abstract summary: Key challenge for kernel generation is the scarcity of high-quality data.<n>We develop a pipeline that generates and curates high-quality kernels with reasoning traces.<n>We show that the average reasoning length can serve as a metric to assess the difficulty of kernel generation tasks.
- Score: 5.010229074860956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GPU kernel generation by LLMs has recently experienced rapid development, leveraging test-time scaling and reinforcement learning techniques. However, a key challenge for kernel generation is the scarcity of high-quality data, as most high-quality kernels are proprietary and not open-source. This challenge prevents us from leveraging supervised fine-tuning to align LLMs to the kernel generation task. To address this challenge, we develop a pipeline that generates and curates high-quality CUDA kernels with reasoning traces, motivated by a critical observation that concise yet informative reasoning traces result in robust generation of high-performance kernels. Using this pipeline, we construct our dataset ConCuR and introduce our model KernelCoder, which is the first model trained on a curated dataset consisting of PyTorch, reasoning, and CUDA kernel pairs, to our knowledge. In the KernelBench setup, our model achieves significant improvements over the existing top-performing model, QwQ-32B, and outperforms all open-source models fine-tuned for kernel generation, as well as frontier models such as DeepSeek-V3.1-Think and Claude-4-sonnet. Finally, we show that the average reasoning length can serve as a metric to assess the difficulty of kernel generation tasks. The observations, metrics, and our data collection and curation pipeline can help obtain better data in the kernel generation task in the future.
Related papers
- CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation [51.72529978689561]
Agent is a large-scale agentic reinforcement learning system that develops kernel expertise through three components.<n>Agent delivers 100%, 100%, and 92% faster rate over torchcompile on KernelBench.
arXiv Detail & Related papers (2026-02-27T18:58:05Z) - DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels [17.979042914049842]
Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs.<n>CuKe is an augmented dataset optimized for high-performance kernels.<n>DICE is a series of diffusion large language models designed for kernel generation.
arXiv Detail & Related papers (2026-02-12T08:45:13Z) - Towards Automated Kernel Generation in the Era of LLMs [17.69471168609145]
Kernel engineering is a time-consuming and non-scalable process.<n>Recent advances in large language models (LLMs) and agentic systems have opened new possibilities for automating kernel generation and optimization.<n>The field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation.
arXiv Detail & Related papers (2026-01-22T07:53:52Z) - AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units [39.846358001824996]
We propose Ascend KernelGen, a generation-evaluation integrated framework for NPU kernel development.<n>We introduce Ascend-CoT, a high-quality dataset incorporating chain-of-thought reasoning derived from real-world kernel implementations.<n>We also design NPU KernelBench, a comprehensive benchmark for assessing compilation, correctness, and performance across varying complexity levels.
arXiv Detail & Related papers (2026-01-12T03:12:58Z) - Generative Latent Kernel Modeling for Blind Motion Deblurring [43.79789971884913]
We present a novel framework for kernel blur estimation based on a deep generative network generator.<n>We achieve state-of-the-art performance on challenging benchmark datasets.
arXiv Detail & Related papers (2025-07-12T13:48:10Z) - Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition [7.532273334759435]
Kernels are key to encoding prior beliefs and data structures in Gaussian process (GP) models.<n>Deep kernel learning enhances kernel flexibility by feeding inputs through a neural network before applying a standard parametric form.<n>We introduce a fully data-driven, scalable deep kernel representation where a neural network directly represents a low-rank kernel.
arXiv Detail & Related papers (2025-05-24T05:42:11Z) - KernelBench: Can LLMs Write Efficient GPU Kernels? [36.4117525096377]
KernelBench is an open-source framework for evaluating language models' ability to write fast and correct kernels.<n>We introduce a new evaluation metric fast_p, which measures the percentage of generated kernels that are functionally correct.<n>Our experiments show that frontier reasoning models perform the best out of the box but still fall short overall.
arXiv Detail & Related papers (2025-02-14T19:30:53Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Generative Kernel Continual learning [117.79080100313722]
We introduce generative kernel continual learning, which exploits the synergies between generative models and kernels for continual learning.
The generative model is able to produce representative samples for kernel learning, which removes the dependence on memory in kernel continual learning.
We conduct extensive experiments on three widely-used continual learning benchmarks that demonstrate the abilities and benefits of our contributions.
arXiv Detail & Related papers (2021-12-26T16:02:10Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.