Fugu-MT 論文翻訳(概要): AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

論文の概要: AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

arxiv url: http://arxiv.org/abs/2604.21590v1
Date: Thu, 23 Apr 2026 12:14:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.481563
Title: AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use
Title（参考訳）: AgenticQwen: 産業用ツール用デュアルデータフライホイールを用いた小型エージェント言語モデルのトレーニング
Authors: Yuanjie Lyu, Chengyu Wang, Haonan Zheng, Yuanhao Yue, Junbing Yan, Ming Wang, Jun Huang,
Abstract要約: 本稿では,マルチラウンド強化学習(RL)を用いて学習したAgenticQwenモデル群と,限られた量のオープンソースデータについて紹介する。我々のトレーニングフレームワークは、推論RLとエージェントRLと2つのデータフライホイールを組み合わせることで、ますます困難なタスクを自動的に生成します。これらのモデルは,複数のエージェントベンチマークにおいて高い性能を達成し,我々の産業エージェントシステムでは,探索およびデータ解析タスクにおいて,はるかに大きなモデルとのギャップを埋める。
参考スコア（独自算出の注目度）: 13.583197273673974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable. In this paper, we introduce the AgenticQwen family of models, trained via multi-round reinforcement learning (RL) on synthetic data and a limited amount of open-source data. Our training framework combines reasoning RL and agentic RL with dual data flywheels that automatically generate increasingly challenging tasks. The reasoning flywheel increases task difficulty by learning from errors, while the agentic flywheel expands linear workflows into multi-branch behavior trees that better reflect the decision complexity of real-world applications. We validate AgenticQwen on public benchmarks and in an industrial agent system. The models achieve strong performance on multiple agentic benchmarks, and in our industrial agent system, close the gap with much larger models on search and data analysis tasks. Model checkpoints and part of the synthetic data: https://huggingface.co/collections/alibaba-pai/agenticqwen. Data synthesis and RL training code: https://github.com/haruhi-sudo/data_synth_and_rl. The data synthesis pipeline is also integrated into EasyDistill: https://github.com/modelscope/easydistill.
Abstract（参考訳）: 現代の産業アプリケーションは、エージェントとして機能し、実世界の環境での多段階の推論とツールの使用が可能な言語モデルをますます要求している。これらのタスクは通常、厳格なコストとレイテンシの制約の下で実行され、小さなエージェントモデルが非常に望ましい。本稿では,合成データと限られた量のオープンソースデータに基づいて,多ラウンド強化学習(RL)によって訓練されたAgenticQwenモデルのファミリーを紹介する。我々のトレーニングフレームワークは、推論RLとエージェントRLと2つのデータフライホイールを組み合わせることで、ますます困難なタスクを自動的に生成します。推論フライホイールは、エラーから学習することでタスクの難易度を高める一方、エージェントフライホイールは、線形ワークフローを実世界のアプリケーションの決定複雑さを反映したマルチブランチ動作ツリーに拡張する。我々はAgenticQwenを公開ベンチマークおよび産業エージェントシステムで検証する。これらのモデルは,複数のエージェントベンチマークにおいて高い性能を達成し,我々の産業エージェントシステムでは,探索およびデータ解析タスクにおいて,はるかに大きなモデルとのギャップを埋める。モデルチェックポイントと合成データの一部:https://huggingface.co/collections/alibaba-pai/agenticqwen データ合成とRLトレーニングコード:https://github.com/haruhi-sudo/data_synth_and_rl。データ合成パイプラインは EasyDistill: https://github.com/modelscope/easydistillにも統合されている。

論文の概要: AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

関連論文リスト