Fugu-MT 論文翻訳(概要): Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

論文の概要: Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

arxiv url: http://arxiv.org/abs/2604.00824v1
Date: Wed, 01 Apr 2026 12:33:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.986358
Title: Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs
Title（参考訳）: しかし、エージェント、推論、コーディングのLLMではさらに改善される
Authors: Yang Ye, Jingyuan Tan, Tianyue Jiang, Ruizhe Ye, Qiankun He, Jiarui Yang, Jian Dong, Sicong Liang, Chongjian Yue, Peibai Xu, Lufan Lu, Taotao Qian, Junbao Hu, Yuechan Hao, Ensheng Shi, Qi Zhang, Yi Hao, Na Fan, Xin Tan, Shuai Yao, Zhiwei Shen, Zongchen Li, Yanlin Wang, Chong Chen, Yuchi Ma,
Abstract要約: 効果的なソフトウェアエンジニアリングエージェントの訓練には、大量のタスク固有の軌道が必要である。より少ないが高品質な訓練軌道で優れたエージェント能力を実現するエンドツーエンドのトレーニングフレームワークを提案する。
参考スコア（独自算出の注目度）: 29.11318811466135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training effective software engineering agents requires large volumes of task-specific trajectories, incurring substantial data construction costs. Inspired by the "Less-Is-More" hypothesis in mathematical reasoning, we investigate its extension to agentic scenarios and propose an end-to-end training framework that achieves superior agentic capabilities with fewer but higher-quality training trajectories. This is achieved via STITCH (Sliding-memory Trajectory Inference and Task Chunking Heuristic), a coarse-to-fine mechanism that filters low-value noise and retains decision-critical tokens to maximize training signal quality. We conduct experiments across multiple agent frameworks (e.g., mini-SWE-agent, MSWE-agent), model scales (30B to 355B), and multilingual settings (Python, Java, and ArkTS). On SWE-bench Verified, models trained with STITCH achieve up to 63.16% relative improvement over base models. On Multi-SWE-bench (Java), MiniMax-M2.5-STITCH achieves 43.75% with our CodeArts Agent scaffold (+16.67%). On HarmonyOS (ArkTS), GLM-4.7-STITCH improves the compilation pass rate to 61.31% (+43.34%) with less than 1K training trajectories. Our results confirm that the "Less-Is-More" paradigm generalizes effectively to complex agentic tasks across diverse languages and model scales.
Abstract（参考訳）: 効果的なソフトウェアエンジニアリングエージェントの訓練には、大量のタスク固有の軌道が必要で、かなりのデータ構築コストがかかる。数学的推論における"Less-Is-More"仮説にインスパイアされ、エージェントシナリオへの拡張を調査し、より少ないが高品質なトレーニング軌道で優れたエージェント能力を実現するエンドツーエンドのトレーニングフレームワークを提案する。 STITCH(Sliding-Memory Trajectory Inference and Task Chunking Heuristic)は、低値ノイズをフィルタし、信号品質を最大化するために決定クリティカルトークンを保持する粗い微細化機構である。複数のエージェントフレームワーク(例: mini-SWE-agent, MSWE-agent)、モデルスケール(30Bから355B)、マルチ言語設定(Python、Java、ArkTS)で実験を行う。 SWE-bench Verifiedでは、STITCHで訓練されたモデルはベースモデルよりも63.16%向上した。 Multi-SWE-bench (Java) では,MiniMax-M2.5-STITCH が CodeArts Agent の足場 (+16.67%) で43.75% を達成した。 HarmonyOS(ArkTS)では、GLM-4.7-STITCHはコンパイルパスレートを61.31%(+43.34%)に改善し、1Kのトレーニングトラジェクトリ未満である。提案手法は,多様な言語やモデルスケールにまたがる複雑なエージェントタスクを効果的に一般化する。

論文の概要: Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

関連論文リスト