Related papers: QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

URL: http://arxiv.org/abs/2510.11696v1
Date: Mon, 13 Oct 2025 17:55:09 GMT
Title: QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Authors: Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen,
Abstract summary: We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs)<n>QeRL addresses issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA)<n>Experiments demonstrate that QeRL delivers over 1.5 times speedup in the rollout phase.
Score: 80.76334908639745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs). While RL is essential for LLMs' reasoning capabilities, it is resource-intensive, requiring substantial GPU memory and long rollout durations. QeRL addresses these issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RL while reducing memory overhead. Beyond efficiency, our findings show that quantization noise increases policy entropy, enhancing exploration, and enabling the discovery of better strategies during RL. To further optimize exploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism, which dynamically adjusts noise during training. Experiments demonstrate that QeRL delivers over 1.5 times speedup in the rollout phase. Moreover, this is the first framework to enable RL training of a 32B LLM on a single H100 80GB GPU, while delivering overall speedups for RL training. It also achieves faster reward growth and higher final accuracy than 16-bit LoRA and QLoRA, while matching the performance of full-parameter fine-tuning on mathematical benchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. These results establish QeRL as an efficient and effective framework for RL training in LLMs.

Related papers

QuRL: Efficient Reinforcement Learning with Quantized Rollout [23.326106976928898]
Reinforcement learning with verifiable rewards (RLVR) has become a trending paradigm for training reasoning large language models (LLMs)<n>Due to the autoregressive decoding nature of LLMs, the rollout process becomes the efficiency bottleneck of RL training, consisting of up to 70% of the total training time.<n>We propose Quantized Reinforcement Learning (QuRL) that uses a quantized actor for accelerating the rollout.
arXiv Detail & Related papers (2026-02-15T01:48:10Z)
From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning? [76.288870982181]
Protein language models (PLMs) have advanced computational protein science through large-scale pretraining and scalable architectures.<n> reinforcement learning (RL) has broadened exploration and enabled precise multi-objective optimization in protein design.<n>We ask if RL improves sampling efficiency and, more importantly, if it reveals capabilities not captured by supervised learning.
arXiv Detail & Related papers (2025-10-02T01:31:10Z)
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation [40.120847511378365]
Reinforcement learning (RL) has become a cornerstone in advancing large-scale pre-trained language models (LLMs)<n>We propose Active Partial Rollouts in Reinforcement Learning (APRIL), which mitigates long-tail inefficiency.<n>APRIL over-provisions rollout requests, terminates once the target number of responses is reached, and recycles incomplete responses for continuation in future steps.
arXiv Detail & Related papers (2025-09-23T01:32:36Z)
A Survey of Reinforcement Learning for Large Reasoning Models [98.58081012669369]
Review of recent advances in Reinforcement Learning for reasoning with Large Language Models.<n>Further scaling of RL for LRMs now faces challenges not only in computational resources but also in algorithm design, training data, and infrastructure.
arXiv Detail & Related papers (2025-09-10T17:59:43Z)
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [93.00629872970364]
Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks.<n>We introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions.<n>We study whether difficult problems -- those yielding no RL signals and mixed-quality reasoning traces -- can still be effectively used for training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z)
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training [32.575669924032276]
Reinforcement Learning (RL) has become the most effective post-training approach for improving the capabilities of Large Language Models (LLMs)<n>We present LlamaRL, a fully distributed, asynchronous RL framework optimized for efficient training of large-scale LLMs.
arXiv Detail & Related papers (2025-05-29T22:14:15Z)
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z)
Parameter Efficient Reinforcement Learning from Human Feedback [27.687265760622918]
Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models with human preferences. To alleviate some of the computational burden of fine-tuning, efficient methods, like LoRA were introduced. We benchmark the PE-RLHF setup on six diverse datasets spanning summarization, harmless/helpful response generation, UI automation, and visual question answering.
arXiv Detail & Related papers (2024-03-15T21:43:46Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training. SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.