INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
- URL: http://arxiv.org/abs/2505.07291v1
- Date: Mon, 12 May 2025 07:24:33 GMT
- Title: INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
- Authors: Prime Intellect Team, Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Kushal Thaman, Matthew Di Ferrante, Felix Gabriel, Fares Obeid, Kemal Erdem, Michael Keiblinger, Johannes Hagemann,
- Abstract summary: INTELLECT-2 is the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model.<n>To enable a training run with this unique infrastructure, we built various components from scratch.<n>We open-source INTELLECT-2 along with all of our code and data, hoping to encourage and enable more open research in the field of decentralized training.
- Score: 2.2063836130393817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors. To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers. Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were crucial to achieve training stability and ensure that our model successfully learned its training objective, thus improving upon QwQ-32B, the state of the art reasoning model in the 32B parameter range. We open-source INTELLECT-2 along with all of our code and data, hoping to encourage and enable more open research in the field of decentralized training.
Related papers
- Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage Approach [68.198383438396]
We present LAMeTA, a Large AI Model (LAM)-empowered Two-stage Approach for intent-aware agentic network optimization.<n>First, we propose Intent-oriented Knowledge Distillation (IoKD), which efficiently distills intent-understanding capabilities.<n>Second, we develop Symbiotic Reinforcement Learning (SRL), integrating E-LAMs with a policy-based DRL framework.
arXiv Detail & Related papers (2025-05-18T05:59:16Z) - DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training [15.74527731339671]
We present a principled curriculum learning framework grounded in the notion of distribution-level learnability.<n>Our framework prioritizes distributions with either high average advantage (exploitation) or low sample count (exploration)<n>Our experiments show that our framework significantly improves convergence speed and final performance.
arXiv Detail & Related papers (2025-04-13T20:10:27Z) - Meta-Computing Enhanced Federated Learning in IIoT: Satisfaction-Aware Incentive Scheme via DRL-Based Stackelberg Game [50.6166553799783]
Efficient IIoT operations require a trade-off between model quality and training latency.<n>This paper designs a satisfaction function that accounts for data size, Age of Information (AoI), and training latency for meta-computing.<n>We employ a deep reinforcement learning approach to learn the Stackelberg equilibrium.
arXiv Detail & Related papers (2025-02-10T03:33:36Z) - Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learning [15.61141633436468]
Federated Learning (FL) empowers multiple clients to collaboratively train machine learning models without sharing local data.<n>This paper proposes a novel Heterogeneity-aware Personalized Federated Learning method, named HAPFL, via multi-level Reinforcement Learning (RL) mechanisms.<n> Experimental results across multiple benchmark datasets demonstrate that HAPFL not only achieves high accuracy but also substantially reduces the overall training time by 20.9%-40.4%.
arXiv Detail & Related papers (2025-01-28T14:08:57Z) - RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies.<n>We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations.<n>Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z) - FedMS: Federated Learning with Mixture of Sparsely Activated Foundations
Models [11.362085734837217]
We propose a novel two-stage federated learning algorithm called FedMS.
A global expert is trained in the first stage and a local expert is trained in the second stage to provide better personalization.
We employ extensive experiments to verify the effectiveness of FedMS, results show that FedMS outperforms other SOTA baselines by up to 55.25% in default settings.
arXiv Detail & Related papers (2023-12-26T07:40:26Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training [11.749347656959822]
We propose a flexible model placement framework that offers two general and agile model placement strategies.
Our framework provides a simple user interface and guidelines to easily and flexibly configure these strategies in various training scenarios.
arXiv Detail & Related papers (2023-12-19T03:24:55Z) - Exploring the Robustness of Decentralized Training for Large Language
Models [51.41850749014054]
Decentralized training of large language models has emerged as an effective way to democratize this technology.
This paper explores the robustness of decentralized training from three main perspectives.
arXiv Detail & Related papers (2023-12-01T04:04:03Z) - Bayesian Generational Population-Based Training [35.70338636901159]
Population-Based Training (PBT) has led to impressive performance in several large scale settings.
We introduce two new innovations in PBT-style methods.
We show that these innovations lead to large performance gains.
arXiv Detail & Related papers (2022-07-19T16:57:38Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.