EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models
- URL: http://arxiv.org/abs/2510.05943v1
- Date: Tue, 07 Oct 2025 13:52:51 GMT
- Title: EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models
- Authors: Zheyue Tan, Mustapha Abdullahi, Tuo Shi, Huining Yuan, Zelai Xu, Chao Yu, Boxun Li, Bo Zhao,
- Abstract summary: Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training.<n>We present EARL, a scalable system for efficient agentic RL.
- Score: 10.372430331898608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two practical bottlenecks: (1) context length grows rapidly during training, inflating memory usage and latency, and triggering out-of-memory (OOM) failures; and (2) intermediate tensors accumulate with context length, making cross-device data movement a major system bottleneck. We present EARL, a scalable system for efficient agentic RL. EARL designs a parallelism selector that dynamically adapts model and training parallelism across RL stages based on sequence length and system load, and a data dispatcher that performs layout-aware, decentralized exchange of intermediate data batches. Together, these components increase throughput, reduce long-context failures, and enable stable large-scale training of agentic LLMs without relying on hard limits or penalties of context length.
Related papers
- DiRL: An Efficient Post-Training Framework for Diffusion Language Models [54.405206032785706]
Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models.<n>Existing methods suffer from computational inefficiency and objective mismatches between training and inference.<n>We introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference.
arXiv Detail & Related papers (2025-12-23T08:33:19Z) - Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter [52.111923076688505]
Training Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving.<n>We propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding.
arXiv Detail & Related papers (2025-11-20T18:59:25Z) - AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs [24.96730768606278]
We present AReaL-Hex, a heterogeneous-aware asynchronous RL training system.<n>It effectively schedules how to execute rollout generation and policy model training over heterogeneous GPUs.<n>It delivers up to 1.50x higher training throughput and 1.46x reduction in training cost.
arXiv Detail & Related papers (2025-11-02T04:17:30Z) - Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels [96.35283762778137]
We introduce the Webscale-RL pipeline, a scalable data engine for reinforcement learning.<n>We construct the Webscale-RL dataset, containing 1.2 million examples across more than 9 domains.<n>Our work presents a viable path toward scaling RL to pre-training levels, enabling more capable and efficient language models.
arXiv Detail & Related papers (2025-10-07T22:30:59Z) - AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework [76.96794548655292]
Large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions.<n>Applying reinforcement learning (RL) to train LLM agents in multi-turn, multi-task settings remains challenging due to lack of scalable infrastructure and stable training algorithms.<n>We present the AgentRL framework for scalable multi-turn, multi-task agentic RL training.
arXiv Detail & Related papers (2025-10-05T13:40:01Z) - PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation [47.510888611491]
Reinforcement Learning (RL) is increasingly utilized to enhance the reasoning capabilities of Large Language Models (LLMs)<n>This paper introduces PipelineRL, an approach designed to achieve a superior trade-off between hardware efficiency and data on-policyness.
arXiv Detail & Related papers (2025-09-23T15:15:21Z) - ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation [57.399685080574756]
Existing MLLM-based VLN methods rely on imitation learning (IL) and often use DAgger for post-training.<n>We propose ActiveVLN, a VLN framework that explicitly enables active exploration through multi-turn RL.<n>Experiments show that ActiveVLN achieves the largest performance gains over IL baselines compared to both DAgger-based and prior RL-based post-training methods.
arXiv Detail & Related papers (2025-09-16T03:31:46Z) - Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models [74.15250326312179]
Diffusion Large Language Models offer efficient parallel generation and capable global modeling.<n>The dominant application ofDLLMs is hindered by the need for a statically predefined generation length.<n>We introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion.
arXiv Detail & Related papers (2025-08-01T17:56:07Z) - AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning [23.24949857136035]
Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs)<n>We present AReaL, a fully asynchronous RL system that completely decouples generation from training.
arXiv Detail & Related papers (2025-05-30T07:18:25Z) - StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)<n>StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.<n> Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning [20.683081355473664]
Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy.
ComDML balances workload among agents through a decentralized approach.
ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-05-01T20:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.