DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
- URL: http://arxiv.org/abs/2511.20468v1
- Date: Tue, 25 Nov 2025 16:33:42 GMT
- Title: DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
- Authors: Yuanhao Li, Mingshan Liu, Hongbo Wang, Yiding Zhang, Yifei Ma, Wei Tan,
- Abstract summary: DRAFT-RL is a novel framework that integrates Chain-of-Draft (CoD) reasoning into multi-agent RL training.<n>We evaluate our method on complex reasoning tasks including code synthesis, symbolic math, and knowledge-intensive QA.
- Score: 8.532777609640268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown impressive capabilities in multi-step reasoning and problem-solving.Recent works introduce multi-agent reflection frameworks where multiple LLM agents critique and refine each other's outputs using reinforcement learning (RL). However, these approaches often rely on single-shot responses and lack structural diversity in reasoning exploration. In this paper, we propose DRAFT-RL, a novel framework that integrates Chain-of-Draft (CoD) reasoning into multi-agent RL training. Instead of generating single responses, each agent produces multiple drafts per query, which are then evaluated by peer agents and a learned reward model to identify the most promising trajectory. These selected drafts are used to refine future reasoning strategies through actor-critic learning.DRAFT-RL enables explicit multi-path exploration, peer-guided reflection, and reward-aligned selection, resulting in more robust and interpretable LLM agent behavior. We evaluate our method on complex reasoning tasks including code synthesis, symbolic math, and knowledge-intensive QA,demonstrating that DRAFT-RL outperforms existing reflective and RL-based agents by significant margins in both accuracy and convergence speed
Related papers
- Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning [112.16686518063456]
We introduce textbfMulti-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time.<n>MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making.<n>Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67% over a multi-agent baseline, and by 8.67% over comparable single-agent baselines
arXiv Detail & Related papers (2026-01-14T17:57:43Z) - Multimodal Reinforcement Learning with Agentic Verifier for AI Agents [131.46008226323423]
Argos is a principled multimodal reward agent to train reasoning models for agentic tasks.<n>By leveraging our agentic verifier across both SFT data and RL training, our model achieves state-of-the-art results.
arXiv Detail & Related papers (2025-12-03T04:42:47Z) - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z) - Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design [35.544075583073685]
We present the first systematic study of textitturn-level reward design for multi-turn RL algorithms and agent applications.<n>We conduct case studies on multi-turn reasoning-augmented search agents, where we carefully design two types of turn-level rewards: verifiable and LLM-as-judge.<n>Our experiments on multi-turn search tasks demonstrate that incorporating well-designed turn-level rewards enables RL algorithms to significantly outperform baseline methods with trajectory-level rewards.
arXiv Detail & Related papers (2025-05-17T04:09:46Z) - Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models [22.796496516709514]
This paper provides a systematic review of recent advances in reinforcement learning (RL)-based reasoning for Multimodal Large Language Models (MLLMs)<n>We highlight two main RL paradigms, value-model-free and value-model-based methods, and analyze how RL enhances reasoning abilities by optimizing reasoning trajectories and aligning multimodal information.<n>We provide an extensive overview of benchmark datasets, evaluation protocols, and current limitations, and propose future research directions to address challenges such as sparse rewards, inefficient cross-modal reasoning, and real-world deployment constraints.
arXiv Detail & Related papers (2025-04-30T03:14:28Z) - RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.96848846966087]
Training large language models (LLMs) as interactive agents presents unique challenges.<n>While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored.<n>We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents [26.645038049346255]
We propose the Reactive and Reflection agents with Multi-Path Reasoning (RR-MP) Framework.<n>Our approach improves scientific reasoning accuracy by employing a multi-path reasoning mechanism.<n>We conducted zero-shot and few-shot evaluations on tasks involving moral scenarios, college-level physics, and mathematics.
arXiv Detail & Related papers (2024-12-31T13:11:20Z) - Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping.
Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.