Fugu-MT 論文翻訳(概要): Wink: Recovering from Misbehaviors in Coding Agents

論文の概要: Wink: Recovering from Misbehaviors in Coding Agents

arxiv url: http://arxiv.org/abs/2602.17037v1
Date: Thu, 19 Feb 2026 03:15:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-20 15:21:28.614528
Title: Wink: Recovering from Misbehaviors in Coding Agents
Title（参考訳）: Wink: コーディングエージェントの誤解から回復する
Authors: Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, Satish Chandra,
Abstract要約: ソフトウェア業界では、複雑なエンジニアリングタスクを自動化するために、自動コーディングエージェントがますます採用されている。これらのエージェントは、ユーザの指示から逸脱したり、繰り返しループで立ち往生したり、ツールを正しく使わなかったりするなど、幅広い誤動作を起こしやすい。本稿では,エージェントの誤動作から自動的に回復するシステムについて述べる。
参考スコア（独自算出の注目度）: 6.794419834325995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to a wide range of misbehaviors, such as deviating from the user's instructions, getting stuck in repetitive loops, or failing to use tools correctly. These failures disrupt the development workflow and often require resource-intensive manual intervention. In this paper, we present a system for automatically recovering from agentic misbehaviors at scale. We first introduce a taxonomy of misbehaviors grounded in an analysis of production traffic, identifying three primary categories: Specification Drift, Reasoning Problems, and Tool Call Failures, which we find occur in about 30% of all agent trajectories. To address these issues, we developed a lightweight, asynchronous self-intervention system named Wink. Wink observes agent trajectories and provides targeted course-correction guidance to nudge the agent back to a productive path. We evaluated our system on over 10,000 real world agent trajectories and found that it successfully resolves 90% of the misbehaviors that require a single intervention. Furthermore, a live A/B test in our production environment demonstrated that our system leads to a statistically significant reduction in Tool Call Failures, Tokens per Session and Engineer Interventions per Session. We present our experience designing and deploying this system, offering insights into the challenges of building resilient agentic systems at scale.
Abstract（参考訳）: 大規模言語モデル(LLM)を活用した自動コーディングエージェントは、複雑なエンジニアリングタスクを自動化するために、ソフトウェア業界でますます採用されている。しかし、これらのエージェントは、ユーザの指示から逸脱したり、繰り返しループで立ち往生したり、ツールを正しく使わなかったりするなど、幅広い誤動作を起こしやすい。これらの失敗は開発ワークフローを混乱させ、しばしばリソース集約的な手作業による介入を必要とします。本稿では,エージェントの誤動作を大規模に回復するシステムを提案する。まず,生産トラフィックの分析に基づく誤動作の分類を導入し,各エージェントの約30%で発生する,仕様ドリフト,推論問題,ツールコール障害の3つの主要なカテゴリを特定した。これらの問題に対処するため、Winkという軽量で非同期な自己干渉システムを開発した。ウィンクはエージェントの軌道を観察し、エージェントを生産的な経路に戻すための目標のコース補正ガイダンスを提供する。我々は,1万件以上の実世界のエージェント・トラジェクトリーを用いてシステム評価を行い,単一の介入を必要とする不正行動の90%を解決できることを発見した。さらに、実運用環境でのライブA/Bテストでは、私たちのシステムが、Tool Call Failures、Tokens per Session、 Engineer Interventions per Sessionの統計的に顕著な削減につながることを示した。我々は、このシステムを設計、デプロイし、大規模にレジリエントなエージェントシステムを構築する際の課題についての洞察を提供する。

論文の概要: Wink: Recovering from Misbehaviors in Coding Agents

関連論文リスト