Related papers: Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling

Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling

URL: http://arxiv.org/abs/2511.05951v1
Date: Sat, 08 Nov 2025 09:47:27 GMT
Title: Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling
Authors: Qi Wang, Hongzhi Zhang, Jia Fu, Kai Fu, Yahui Liu, Tinghai Zhang, Chenxi Sun, Gangwei Jiang, Jingyi Tang, Xingguang Ji, Yang Yue, Jingyuan Zhang, Fuzheng Zhang, Kun Gai, Guorui Zhou,
Abstract summary: We present a comprehensive and fully open-source pipeline for training a high-performance agentic model, named Klear-Qwen3-AgentForge.<n>We design effective supervised fine-tuning (SFT) with synthetic data followed by multi-turn reinforcement learning (RL) to unlock the potential for multiple diverse agentic tasks.
Score: 46.593200463657645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the proliferation of powerful agentic models, the lack of critical post-training details hinders the development of strong counterparts in the open-source community. In this study, we present a comprehensive and fully open-source pipeline for training a high-performance agentic model for interacting with external tools and environments, named Klear-Qwen3-AgentForge, starting from the Qwen3-8B base model. We design effective supervised fine-tuning (SFT) with synthetic data followed by multi-turn reinforcement learning (RL) to unlock the potential for multiple diverse agentic tasks. We perform exclusive experiments on various agentic benchmarks in both tool use and coding domains. Klear-Qwen3-AgentForge-8B achieves state-of-the-art performance among LLMs of similar size and remains competitive with significantly larger models.

Related papers

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z)
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL [28.10102994309489]
We introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data.<n>Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning.<n>We develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method.
arXiv Detail & Related papers (2026-01-07T09:31:10Z)
Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback [51.22403664895878]
Agent2World is a tool-augmented multi-agent framework that achieves strong inference-time world-model generation.<n>It also serves as a data engine for supervised fine-tuning, by grounding generation in multi-agent feedback.
arXiv Detail & Related papers (2025-12-26T18:54:14Z)
Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z)
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments [30.078263383249862]
Toucan is the largest publicly available tool-agentic dataset to date.<n>It generates diverse, realistic, and challenging tasks with trajectories involving real tool execution.
arXiv Detail & Related papers (2025-10-01T17:58:03Z)
Scaling Agents via Continual Pre-training [80.97989245493326]
We propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models.<n>We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability.
arXiv Detail & Related papers (2025-09-16T17:57:19Z)
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL [41.847359443133776]
Chain-of-Agents (CoA) is a novel paradigm of large language models (LLMs) reasoning that enables native end-to-end complex problem-solving.<n>We introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning.<n>We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving.
arXiv Detail & Related papers (2025-08-06T17:01:02Z)
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning [29.605396813225386]
We show how reinforcement learning can be used to train agents for multi-turn interactive tasks.<n>Our methodology offers a practical approach for training capable agents for multi-turn interactive tasks using open-weight models.
arXiv Detail & Related papers (2025-08-05T14:30:47Z)
xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks. xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z)
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.