An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
- URL: http://arxiv.org/abs/2312.11819v3
- Date: Mon, 14 Oct 2024 11:57:00 GMT
- Title: An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
- Authors: Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou,
- Abstract summary: We propose a flexible model placement framework that offers two general and agile model placement strategies.
Our framework provides a simple user interface and guidelines to easily and flexibly configure these strategies in various training scenarios.
- Score: 11.749347656959822
- License:
- Abstract: Recently, ChatGPT or InstructGPT like large language models (LLM) has made a significant impact in the AI world. Many works have attempted to reproduce the complex InstructGPT's training pipeline, namely Reinforcement Learning with Human Feedback (RLHF). However, the mainstream distributed RLHF training methods typically adopt a fixed model placement strategy, referred to as the Co-located strategy. This strategy treats all four interdependent models involved in RLHF as a single entity, distributing them across all devices and applying parallelism techniques designed for a single model, regardless of the workload heterogeneity inherent to each model. As a result, this strategy exacerbates the generation bottlenecks in the RLHF training and degrades the overall training efficiency. To address these issues, we propose a flexible model placement framework that offers two general and agile model placement strategies. The Interleaving strategy helps reduce memory redundancy and communication costs of RLHF training by placing models without dependencies on exclusive devices with careful orchestration. On the other hand, the Disaggregated strategy improves the throughput of model training by separating the training and inference runtime of the RLHF pipeline with additional shadow models. Furthermore, our framework provides a simple user interface and guidelines to easily and flexibly configure these strategies in various training scenarios. Our experiments have shown that our strategy can achieve notable improvements up to 11x, compared to the current state-of-the-art (SOTA) approaches. The results highlight the effectiveness and adaptability of our methods in accelerating the training of distributed RLHF.
Related papers
- Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? [69.38149239733994]
We investigate whether complex robust training strategies remain necessary as model capacity grows.
We find that as models become more powerful, the performance gains brought by complex robust training methods drop off dramatically.
Our findings suggest that RAG systems can benefit from simpler architectures and training strategies as models become more powerful.
arXiv Detail & Related papers (2025-02-17T03:34:31Z) - Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits [59.30310692855397]
We propose a unified framework for the RLHF pipeline from the view of contextual bandits.
We decompose the RLHF process into two distinct stages: (post-)training and deployment.
We then develop novel algorithms for each stage, demonstrating significant improvements in both statistical and computational efficiency.
arXiv Detail & Related papers (2025-02-11T02:36:01Z) - Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning [15.61141633436468]
Federated Learning (FL) empowers multiple clients to collaboratively train machine learning models without sharing local data.
This paper proposes a novel Heterogeneity-aware Personalized Federated Learning method, named HAPFL, via multi-level Reinforcement Learning (RL) mechanisms.
Experimental results across multiple benchmark datasets demonstrate that HAPFL not only achieves high accuracy but also substantially reduces the overall training time by 20.9%-40.4%.
arXiv Detail & Related papers (2025-01-28T14:08:57Z) - ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment [7.916080032572087]
atom is a resilient distributed training framework designed for asynchronous training of vast models in a decentralized setting.
atom aims to accommodate a complete LLM on one host (peer) through seamlessly model swapping and concurrently trains multiple copies across various peers to optimize training throughput.
Our experiments using different GPT-3 model configurations reveal that, in scenarios with suboptimal network connections, atom can enhance training efficiency up to $20 times$ when juxtaposed with the state-of-the-art decentralized pipeline parallelism approaches.
arXiv Detail & Related papers (2024-03-15T17:43:43Z) - Towards Robust Federated Learning via Logits Calibration on Non-IID Data [49.286558007937856]
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks.
Recent studies have shown that FL is vulnerable to adversarial examples, leading to a significant drop in its performance.
In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks.
arXiv Detail & Related papers (2024-03-05T09:18:29Z) - Federated Learning over Hierarchical Wireless Networks: Training Latency Minimization via Submodel Partitioning [15.311309249848739]
Hierarchical independent submodel training (HIST) is a new FL methodology that aims to address these issues in hierarchical cloud-edge-client networks.
We demonstrate how HIST can be augmented with over-the-air computation (AirComp) to further enhance the efficiency of the model aggregation over the edge cells.
arXiv Detail & Related papers (2023-10-27T04:42:59Z) - Vertical Federated Learning over Cloud-RAN: Convergence Analysis and
System Optimization [82.12796238714589]
We propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation.
We characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions.
We establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed.
arXiv Detail & Related papers (2023-05-04T09:26:03Z) - Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner.
Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server.
This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z) - Tensor Decomposition based Personalized Federated Learning [12.420951968273574]
Federated learning (FL) is a new distributed machine learning framework that can achieve reliably collaborative training without collecting users' private data.
Due to FL's frequent communication and average aggregation strategy, they experience challenges scaling to statistical diversity data and large-scale models.
We propose a personalized FL framework, named Decomposition based Personalized learning (TDPFed), in which we design a novel tensorized local model with tensorized linear layers and convolutional layers to reduce the communication cost.
arXiv Detail & Related papers (2022-08-27T08:09:14Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.