Related papers: AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

URL: http://arxiv.org/abs/2507.01663v1
Date: Wed, 02 Jul 2025 12:45:34 GMT
Title: AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu,
Abstract summary: Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs)<n>Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks.<n>Task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance.<n>We propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training.
Score: 24.60677187852425
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.

Related papers

High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization [0.0]
ClusterEnv is a learner-agnostic interface for distributed environment execution that mirrors the Gymnasium API.<n>ClusterEnv introduces the DETACH pattern, which decouples simulation from training by offloading reset() and step() operations to remote workers while keeping learning centralized.<n>We propose Adaptive Actor Policy Synchronization (AAPS), a divergence-triggered update mechanism that reduces synchronization overhead without sacrificing performance.
arXiv Detail & Related papers (2025-07-15T05:07:12Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)<n>StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.<n> Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models [11.624678008637623]
We propose separating generation and learning in RLHF.<n>Online DPO is found to be most robust to off-policy data.<n>Asynchronous training relies on an underexplored regime, online but off-policy RLHF.
arXiv Detail & Related papers (2024-10-23T19:59:50Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Spreeze: High-Throughput Parallel Reinforcement Learning Framework [19.3019166138232]
Spreeze is a lightweight parallel framework for reinforcement learning. It efficiently utilizes a single desktop hardware resource to approach the throughput limit. It can achieve up to 15,000Hz experience sampling and 370,000Hz network update frame rate.
arXiv Detail & Related papers (2023-12-11T05:25:01Z)
Efficient Parallel Reinforcement Learning Framework using the Reactor Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources. Existing frameworks, such as Ray, are not managing this orchestration efficiently. We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Asynchronous Multi-Model Dynamic Federated Learning over Wireless Networks: Theory, Modeling, and Optimization [20.741776617129208]
Federated learning (FL) has emerged as a key technique for distributed machine learning (ML) We first formulate rectangular scheduling steps and functions to capture the impact of system parameters on learning performance. Our analysis sheds light on the joint impact of device training variables and asynchronous scheduling decisions.
arXiv Detail & Related papers (2023-05-22T21:39:38Z)
Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks [56.91063444859008]
Federated Learning (FL) is a collaborative machine learning framework that combines on-device training and server-based aggregation. We propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems. We show that an age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
arXiv Detail & Related papers (2022-12-14T17:33:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.