Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction
- URL: http://arxiv.org/abs/2602.01202v1
- Date: Sun, 01 Feb 2026 12:44:59 GMT
- Title: Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction
- Authors: Mingze Kong, Zikun Qu, Zhongquan Zhou, Pengyu Liang, Xiang Li, Zhiwei Shang, Zhi Hong, Kaiyu Huang, Zhiyong Wang, Zhongxiang Dai,
- Abstract summary: We present gradient-R1, a framework that reformulates workflow construction as a multi-turn, natural language-based sequential decision-making process.<n>GSsPO functions as a structure-aware RL algorithm generalizable to a broad class of multi-turn agentic sequential decision-making tasks.
- Score: 25.928675237308074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid evolution of agentic workflows has demonstrated strong performance of LLM-based agents in addressing complex reasoning tasks. However, existing workflow optimization methods typically formulate workflow synthesis as a static, one-shot code-centric generation problem. This paradigm imposes excessive constraints on the model's coding capabilities and restricts the flexibility required for dynamic problem-solving. In this paper, we present Workflow-R1, a framework that reformulates workflow construction as a multi-turn, natural language-based sequential decision-making process. To resolve the optimization granularity mismatch inherent in such multi-turn interactions, we introduce Group Sub-sequence Policy Optimization (GSsPO). While explicitly tailored to align with the interleaved Think-Action dynamics of agentic reasoning, GSsPO fundamentally functions as a structure-aware RL algorithm generalizable to a broad class of multi-turn agentic sequential decision-making tasks. By recalibrating the optimization unit to the composite sub-sequence, specifically the atomic Think-Action cycle, it aligns gradient updates with the semantic boundaries of these interactions, ensuring robust learning in complex multi-turn reasoning tasks. Through extensive experiments on multiple QA benchmarks, Workflow-R1 outperforms competitive baselines, validating GSsPO as a generalized solution for sequential reasoning and establishing Workflow-R1 as a promising new paradigm for automated workflow optimization.
Related papers
- FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning [49.369614288007334]
FlowSteer is an end-to-end reinforcement learning framework that takes a lightweight policy model as the agent and an executable canvas environment.<n>We show that FlowSteer significantly outperforms baselines across various tasks.
arXiv Detail & Related papers (2026-02-02T05:30:42Z) - SPOGW: a Score-based Preference Optimization method via Group-Wise comparison for workflows [23.667139832926225]
Large language models (LLMs) have exhibited significant capabilities in addressing challenges throughout various fields, often through the use of agentic.<n>Recent studies have aimed to minimize the human intervention needed for their construction, leading to advances in automated techniques for optimizing agentic.<n>We introduce a new score-based preference approach, refereed as SPOGW, which operates directly on cardinal reward signals through group-wise comparison.
arXiv Detail & Related papers (2025-10-05T08:26:29Z) - Towards Agentic AI for Multimodal-Guided Video Object Segmentation [14.877182670778284]
Referring-based Video Object is a multimodal problem that requires producing fine-grained segmentation results guided by external cues.<n>Recent advances in vision-language foundation models open a promising direction toward training-free approaches.<n>We propose Multi-Modal Agent, a novel agentic system designed to solve this task in a more flexible and adaptive manner.
arXiv Detail & Related papers (2025-08-14T12:11:15Z) - Polymath: A Self-Optimizing Agent with Dynamic Hierarchical Workflow [6.636150750052998]
Large language models (LLMs) excel at solving complex tasks by executing agentic composed of detailed instructions and structured operations.<n>Many researchers have sought to automate the generation and optimization of these through code-based representations.<n>Existing methods often rely on labeled datasets to train and optimize, making them ineffective and inflexible for solving real-world, dynamic problems.
arXiv Detail & Related papers (2025-08-04T23:50:02Z) - EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models [64.70546873396624]
We present the Extremely Complex Instruction Following Benchmark (EIFBENCH) for evaluating large language models (LLMs)<n>EIFBENCH includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently.<n>We also propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow.
arXiv Detail & Related papers (2025-06-10T02:39:55Z) - Multi-Agent Collaboration via Evolving Orchestration [55.574417128944226]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications.<n>However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains.<n>To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z) - Flow: Modularized Agentic Workflow Automation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution.<n>However, the effective adjustment of agentic during execution has not been well studied.<n>In this paper, we define an activity-on-vertex (AOV) graph, which allows continuous workflow refinement by agents.<n>Our proposed multi-agent framework achieves efficient concurrent execution of subtasks, effective goal achievement, and enhanced error tolerance.
arXiv Detail & Related papers (2025-01-14T04:35:37Z) - Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.