WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
- URL: http://arxiv.org/abs/2508.07970v3
- Date: Mon, 18 Aug 2025 03:48:53 GMT
- Title: WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
- Authors: Junyu Wu, Weiming Chang, Xiaotao Liu, Guanyou He, Tingfeng Xian, Haoqiang Hong, Boqi Chen, Hongtao Tian, Tao Yang, Yunsheng Shi, Feng Lin, Ting Yao, Jiatao Xu,
- Abstract summary: We introduce WeChat-YATT Yet Another Transformer Trainer (YATT) in WeChat, a simple, scalable, and balanced RLHF training framework.<n>YATT features a parallel controller programming model that enables flexible and efficient orchestration of complex RLHF.<n>We evaluate WeChat-YATT across diverse experimental scenarios, demonstrating its substantial throughput improvements over state-of-the-art RLHF training frameworks.
- Score: 34.5103280294468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent paradigm for training large language models and multimodal systems. Despite the notable advances enabled by existing RLHF training frameworks, significant challenges remain to scale to complex multimodal workflows and adapt to dynamic workloads. In particular, current systems often encounter limitations related to controller scalability when managing large models, as well as inefficiencies in orchestrating intricate RLHF pipelines, especially in scenarios that require dynamic sampling and resource allocation. In this paper, we introduce WeChat-YATT Yet Another Transformer Trainer in WeChat, a simple, scalable, and balanced RLHF training framework specifically designed to address these challenges. WeChat-YATT features a parallel controller programming model that enables flexible and efficient orchestration of complex RLHF workflows, effectively mitigating bottlenecks associated with centralized controller architectures and facilitating scalability in large-scale data scenarios. In addition, we propose a dynamic placement schema that adaptively partitions computational resources and schedules workloads, thereby significantly reducing hardware idle time and improving GPU utilization under variable training conditions. We evaluate WeChat-YATT across diverse experimental scenarios, demonstrating its substantial throughput improvements over state-of-the-art RLHF training frameworks. Furthermore, WeChat-YATT has been successfully deployed to train models that support WeChat product features for a large-scale user base, underscoring its effectiveness and robustness in real-world applications. We have made WeChat-YATT publicly available at https://www.github.com/tencent/WeChat-YATT.
Related papers
- RLHFless: Serverless Computing for Efficient RLHF [13.743738615300662]
Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training.<n>We present RLHFless, the first scalable training framework for synchronous RLHF, built on serverless computing environments.
arXiv Detail & Related papers (2026-02-26T07:45:37Z) - Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter [52.111923076688505]
Training Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving.<n>We propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding.
arXiv Detail & Related papers (2025-11-20T18:59:25Z) - Watch and Learn: Learning to Use Computers from Online Videos [50.10702690339142]
Watch & Learn (W&L) is a framework that converts human demonstration videos readily available on the Internet into executable UI trajectories at scale.<n>We develop an inverse dynamics labeling pipeline with task-aware video retrieval, generate over 53k high-quality trajectories from raw web videos.<n>These results highlight web-scale human demonstration videos as a practical and scalable foundation for advancing CUAs towards real-world deployment.
arXiv Detail & Related papers (2025-10-06T10:29:00Z) - CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks [57.95170323315603]
We introduce CollaPipe, a distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving networks.<n>In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks.<n>To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power.
arXiv Detail & Related papers (2025-09-24T07:54:01Z) - Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle [53.239242017802056]
Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM)<n>However, current RL pipelines often suffer from training inefficiencies caused by two underexplored issues: Advantage Collapsing and Rollout Silencing.<n>We propose Shuffle-R1, a simple yet principled framework that improves RL fine-tuning efficiency by dynamically restructuring trajectory sampling and batch composition.
arXiv Detail & Related papers (2025-08-07T17:53:47Z) - G-Core: A Simple, Scalable and Balanced RLHF Trainer [35.65011046623611]
Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models.<n>We present textbfG-Core, a simple, scalable, and balanced RLHF training framework designed to address these challenges.
arXiv Detail & Related papers (2025-07-30T15:55:08Z) - AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training [24.60677187852425]
Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs)<n>Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks.<n>Task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance.<n>We propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training.
arXiv Detail & Related papers (2025-07-02T12:45:34Z) - Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems [42.2790464348673]
We propose a multi-fidelity reinforcement learning framework for controlling instabilities in complex systems.<n>The effect of the proposed framework is demonstrated on two complex dynamics in physics.
arXiv Detail & Related papers (2025-04-08T00:50:15Z) - Communication-Control Codesign for Large-Scale Wireless Networked Control Systems [80.30532872347668]
Wireless Networked Control Systems (WNCSs) are essential to Industry 4.0, enabling flexible control in applications, such as drone swarms and autonomous robots.
We propose a practical WNCS model that captures correlated dynamics among multiple control loops with spatially distributed sensors and actuators sharing limited wireless resources over multi-state Markov block-fading channels.
We develop a Deep Reinforcement Learning (DRL) algorithm that efficiently handles the hybrid action space, captures communication-control correlations, and ensures robust training despite sparse cross-domain variables and floating control inputs.
arXiv Detail & Related papers (2024-10-15T06:28:21Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training [11.749347656959822]
We propose a flexible model placement framework that offers two general and agile model placement strategies.
Our framework provides a simple user interface and guidelines to easily and flexibly configure these strategies in various training scenarios.
arXiv Detail & Related papers (2023-12-19T03:24:55Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis [18.084628500554462]
We introduce SINGA-Easy, a new deep learning framework that provides distributed hyper- parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation.
Our experiments on the training and deployment of multi-modality data analysis applications show that the framework is both usable and adaptable to dynamic inference loads.
arXiv Detail & Related papers (2021-08-03T08:39:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.