Related papers: TritonRL: Training LLMs to Think and Code Triton Without Cheating

TritonRL: Training LLMs to Think and Code Triton Without Cheating

URL: http://arxiv.org/abs/2510.17891v1
Date: Sat, 18 Oct 2025 21:36:10 GMT
Title: TritonRL: Training LLMs to Think and Code Triton Without Cheating
Authors: Jiin Woo, Shaowei Zhu, Allen Nie, Zhen Jia, Yida Wang, Youngsuk Park,
Abstract summary: TritonRL is trained with a novel training framework that enables robust and automated kernel synthesis.<n>Our experiments on KernelBench demonstrate that TritonRL achieves state-of-the-art correctness and speedup, surpassing all other Triton-specific models.
Score: 14.859089778654933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid evolution of large language models (LLMs), the demand for automated, high-performance system kernels has emerged as a key enabler for accelerating development and deployment. We introduce TritonRL, a domain-specialized LLM for Triton kernel generation, trained with a novel training framework that enables robust and automated kernel synthesis. Unlike general-purpose programming languages, Triton kernel generation faces unique challenges due to data scarcity and incomplete evaluation criteria, vulnerable to reward hacking. Our approach addresses these challenges end-to-end by distilling Triton-specific knowledge through supervised fine-tuning on curated datasets, and further improving code quality via reinforcement learning (RL) with robust, verifiable rewards and hierarchical reward assignment. Our RL framework robustly detects reward hacking and guides both reasoning traces and code tokens through fine-grained verification and hierarchical reward decomposition, enabling the model to generate high-quality Triton kernels that can truly replace existing modules. With robust and fine-grained evaluation, our experiments on KernelBench demonstrate that TritonRL achieves state-of-the-art correctness and speedup, surpassing all other Triton-specific models and underscoring the effectiveness of our RL-based training paradigm.

Related papers

medR: Reward Engineering for Clinical Offline Reinforcement Learning via Tri-Drive Potential Functions [27.420707144690102]
Reinforcement Learning (RL) offers a powerful framework for optimizing dynamic treatment regimes (DTRs)<n>Existing approaches often rely on manual signals that fail to generalize across diverse pathologies.<n>We propose an automated pipeline leveraging Large Language Models (LLMs) for offline reward design and verification.
arXiv Detail & Related papers (2026-02-03T09:30:32Z)
CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z)
Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z)
Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL [19.659532349434418]
Reinforcement learning (RL) has recently become the dominant paradigm for strengthening the reasoning abilities of large language models.<n>Yet the rule-based reward functions commonly used on mathematical or programming benchmarks assess only answer format and correctness.<n>We propose Dynamic Reasoning Efficiency Reward (DRER) -- a plug-and-play RL reward framework that reshapes both reward and advantage signals.
arXiv Detail & Related papers (2025-09-07T11:52:18Z)
VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning [32.974199255760944]
We introduce a reinforcement learning framework tailored for Verilog code generation.<n>To tackle the problem of sparse and noisy reward signals, we propose a Trace-back based Rescore mechanism.<n>To mitigate catastrophic forgetting and overfitting during RL fine-tuning, we introduce a sample-balanced weighting strategy.
arXiv Detail & Related papers (2025-08-25T20:20:44Z)
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs [87.8306870967343]
We introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL)<n>AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline.<n> Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models.
arXiv Detail & Related papers (2025-07-08T05:38:24Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators [59.625889531331815]
Triton is a high-level Python-like language designed for building efficient GPU kernels.<n>Despite advances in large language models (LLMs) for conventional code generation, these models struggle to generate accurate, performance-optimized Triton code.<n>In this work, we introduce TritonBench, the first comprehensive benchmark for Triton operator generation.
arXiv Detail & Related papers (2025-02-20T17:21:27Z)
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning [50.55776190278426]
Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks.<n>We introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward methods.
arXiv Detail & Related papers (2024-05-29T22:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.