Related papers: EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models

EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models

URL: http://arxiv.org/abs/2510.05943v1
Date: Tue, 07 Oct 2025 13:52:51 GMT
Title: EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models
Authors: Zheyue Tan, Mustapha Abdullahi, Tuo Shi, Huining Yuan, Zelai Xu, Chao Yu, Boxun Li, Bo Zhao,
Abstract summary: Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training.<n>We present EARL, a scalable system for efficient agentic RL.
Score: 10.372430331898608
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two practical bottlenecks: (1) context length grows rapidly during training, inflating memory usage and latency, and triggering out-of-memory (OOM) failures; and (2) intermediate tensors accumulate with context length, making cross-device data movement a major system bottleneck. We present EARL, a scalable system for efficient agentic RL. EARL designs a parallelism selector that dynamically adapts model and training parallelism across RL stages based on sequence length and system load, and a data dispatcher that performs layout-aware, decentralized exchange of intermediate data batches. Together, these components increase throughput, reduce long-context failures, and enable stable large-scale training of agentic LLMs without relying on hard limits or penalties of context length.

Related papers

DiRL: An Efficient Post-Training Framework for Diffusion Language Models [54.405206032785706]
Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models.<n>Existing methods suffer from computational inefficiency and objective mismatches between training and inference.<n>We introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference.
arXiv Detail & Related papers (2025-12-23T08:33:19Z)
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter [52.111923076688505]
Training Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving.<n>We propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding.
arXiv Detail & Related papers (2025-11-20T18:59:25Z)
AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs [24.96730768606278]
We present AReaL-Hex, a heterogeneous-aware asynchronous RL training system.<n>It effectively schedules how to execute rollout generation and policy model training over heterogeneous GPUs.<n>It delivers up to 1.50x higher training throughput and 1.46x reduction in training cost.
arXiv Detail & Related papers (2025-11-02T04:17:30Z)
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels [96.35283762778137]
We introduce the Webscale-RL pipeline, a scalable data engine for reinforcement learning.<n>We construct the Webscale-RL dataset, containing 1.2 million examples across more than 9 domains.<n>Our work presents a viable path toward scaling RL to pre-training levels, enabling more capable and efficient language models.
arXiv Detail & Related papers (2025-10-07T22:30:59Z)
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework [76.96794548655292]
Large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions.<n>Applying reinforcement learning (RL) to train LLM agents in multi-turn, multi-task settings remains challenging due to lack of scalable infrastructure and stable training algorithms.<n>We present the AgentRL framework for scalable multi-turn, multi-task agentic RL training.
arXiv Detail & Related papers (2025-10-05T13:40:01Z)
PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation [47.510888611491]
Reinforcement Learning (RL) is increasingly utilized to enhance the reasoning capabilities of Large Language Models (LLMs)<n>This paper introduces PipelineRL, an approach designed to achieve a superior trade-off between hardware efficiency and data on-policyness.
arXiv Detail & Related papers (2025-09-23T15:15:21Z)
ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation [57.399685080574756]
Existing MLLM-based VLN methods rely on imitation learning (IL) and often use DAgger for post-training.<n>We propose ActiveVLN, a VLN framework that explicitly enables active exploration through multi-turn RL.<n>Experiments show that ActiveVLN achieves the largest performance gains over IL baselines compared to both DAgger-based and prior RL-based post-training methods.
arXiv Detail & Related papers (2025-09-16T03:31:46Z)
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models [74.15250326312179]
Diffusion Large Language Models offer efficient parallel generation and capable global modeling.<n>The dominant application ofDLLMs is hindered by the need for a statically predefined generation length.<n>We introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion.
arXiv Detail & Related papers (2025-08-01T17:56:07Z)
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning [23.24949857136035]
Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs)<n>We present AReaL, a fully asynchronous RL system that completely decouples generation from training.
arXiv Detail & Related papers (2025-05-30T07:18:25Z)
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)<n>StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.<n> Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning [20.683081355473664]
Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. ComDML balances workload among agents through a decentralized approach. ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-05-01T20:03:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.