Related papers: Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

URL: http://arxiv.org/abs/2506.06122v1
Date: Fri, 06 Jun 2025 14:33:56 GMT
Title: Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Authors: Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, Zichen Liu, Haizhou Zhao, Dakai An, Lunxi Cao, Qiyang Cao, Wanxi Deng, Feilei Du, Yiliang Gu, Jiahe Li, Xiang Li, Mingjie Liu, Yijia Luo, Zihe Liu, Yadao Wang, Pei Wang, Tianyuan Wu, Yanan Wu, Yuheng Zhao, Shuaibing Zhao, Jin Yang, Siran Yang, Yingshui Tan, Huimin Yi, Yuchi Xu, Yujin Yuan, Xingyao Zhang, Lin Qu, Wenbo Su, Wei Wang, Jiamang Wang, Bo Zheng,
Abstract summary: ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training, and researchers seeking agile experimentation.
Score: 37.78896862093736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several key modules to serve these user groups effectively. First, a single-controller architecture combined with an abstraction of the parallel worker simplifies the development of the training pipeline. Second, the parallel strategy and data transfer modules enable efficient and scalable training. Third, the rollout scheduler offers fine-grained management of each sample's lifecycle during the rollout stage. Fourth, the environment worker and reward worker support rapid and flexible experimentation with agentic RL algorithms and reward designs. Finally, AutoDeviceMapping allows users to assign resources to different models flexibly across various stages.

Related papers

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously.<n>Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning.<n>We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z)
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents [38.0441002097771]
DistRL is a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents.<n>On average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods.
arXiv Detail & Related papers (2024-10-18T18:19:56Z)
Flextron: Many-in-One Flexible Large Language Model [85.93260172698398]
We introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. We present a sample-efficient training method and associated routing algorithms for transforming an existing trained LLM into a Flextron model. We demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining.
arXiv Detail & Related papers (2024-06-11T01:16:10Z)
Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data. We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z)
Fast Context Adaptation in Cost-Aware Continual Learning [10.515324071327903]
5G and Beyond networks require more complex learning agents and the learning process itself might end up competing with users for communication and computational resources. This creates friction: on the one hand, the learning process needs resources to quickly convergence to an effective strategy; on the other hand, the learning process needs to be efficient, i.e. take as few resources as possible from the user's data plane, so as not to throttle users' resources. In this paper, we propose a dynamic strategy to balance the resources assigned to the data plane and those reserved for learning.
arXiv Detail & Related papers (2023-06-06T17:46:48Z)
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format [88.33443450434521]
Task-oriented dialogue (TOD) systems function as digital assistants, guiding users through various tasks such as booking flights or finding restaurants. Existing toolkits for building TOD systems often fall short of in delivering comprehensive arrays of data, models, and experimental environments. We introduce ConvLab-3: a multifaceted dialogue system toolkit crafted to bridge this gap.
arXiv Detail & Related papers (2022-11-30T16:37:42Z)
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning [16.289623977712086]
Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. We propose SHiFT, the first downstream task-aware, flexible, and efficient model search engine for transfer learning.
arXiv Detail & Related papers (2022-04-04T13:16:46Z)
Efficient Device Scheduling with Multi-Job Federated Learning [64.21733164243781]
We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
arXiv Detail & Related papers (2021-12-11T08:05:11Z)
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models [18.63017668881868]
Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems.
arXiv Detail & Related papers (2021-04-12T02:15:55Z)
rl_reach: Reproducible Reinforcement Learning Experiments for Robotic Reaching Tasks [0.0]
We present rl_reach, a self-contained, open-source and easy-to-use software package. It is designed to run reproducible reinforcement learning experiments for customisable robotic reaching tasks.
arXiv Detail & Related papers (2021-02-09T16:14:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.