Fugu-MT 論文翻訳(概要): Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents

論文の概要: Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents

arxiv url: http://arxiv.org/abs/2510.04491v1
Date: Mon, 06 Oct 2025 05:03:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.687291
Title: Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents
Title（参考訳）: 患者がAIエージェントを混乱させる: テストエージェントのための人間の特性を高忠実にシミュレーションする
Authors: Muyu He, Anand Kumar, Tsach Mackey, Meghana Rajeev, James Zou, Nazneen Rajani,
Abstract要約: TraitBasisは、AIエージェントを体系的にストレステストするための軽量でモデルに依存しない方法である。 TraitBasisは、ステアブルなユーザ特性に対応するアクティベーション空間で方向を学習する。 We observed on average a 2%-30% performance degradation on $tau$-Trait across frontier model。
参考スコア（独自算出の注目度）: 58.00130492861884
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite rapid progress in building conversational AI agents, robustness is still largely untested. Small shifts in user behavior, such as being more impatient, incoherent, or skeptical, can cause sharp drops in agent performance, revealing how brittle current AI agents are. Today's benchmarks fail to capture this fragility: agents may perform well under standard evaluations but degrade spectacularly in more realistic and varied settings. We address this robustness testing gap by introducing TraitBasis, a lightweight, model-agnostic method for systematically stress testing AI agents. TraitBasis learns directions in activation space corresponding to steerable user traits (e.g., impatience or incoherence), which can be controlled, scaled, composed, and applied at inference time without any fine-tuning or extra data. Using TraitBasis, we extend $\tau$-Bench to $\tau$-Trait, where user behaviors are altered via controlled trait vectors. We observe on average a 2%-30% performance degradation on $\tau$-Trait across frontier models, highlighting the lack of robustness of current AI agents to variations in user behavior. Together, these results highlight both the critical role of robustness testing and the promise of TraitBasis as a simple, data-efficient, and compositional tool. By powering simulation-driven stress tests and training loops, TraitBasis opens the door to building AI agents that remain reliable in the unpredictable dynamics of real-world human interactions. We have open-sourced $\tau$-Trai across four domains: airline, retail, telecom, and telehealth, so the community can systematically QA their agents under realistic, behaviorally diverse intents and trait scenarios: https://github.com/collinear-ai/tau-trait.
Abstract（参考訳）: 会話型AIエージェントの構築の急速な進歩にもかかわらず、堅牢性はまだテストされていない。忍耐強く、不完全で、懐疑的なユーザー行動の小さな変化は、エージェントのパフォーマンスの急激な低下を引き起こし、現在のAIエージェントがどのように脆弱であるかを明らかにする。エージェントは標準的な評価の下ではうまく機能するが、より現実的で多様な設定で格段に低下する。我々は、AIエージェントを体系的にストレステストするための軽量でモデルに依存しない方法であるTritBasisを導入することで、この堅牢性テストギャップに対処する。 TraitBasisは、微調整や余分なデータなしに推論時に制御、スケール、構成、適用できる、ステアブルなユーザ特性(例えば、不注意、不整合)に対応するアクティベーション空間の方向を学習する。 TraitBasisを使って$\tau$-Benchを$\tau$-Traitに拡張します。 We observed on average a 2%-30% performance degradation on $\tau$-Trait across frontier models, highlighting the lack of robustness of current AI agent to variation in user behavior。これらの結果は、堅牢性テストの重要な役割と、単純でデータ効率が高く、構成ツールとしてのTritBasisの約束の両方を強調している。 TraitBasisは、シミュレーション駆動ストレステストとトレーニングループをパワーアップすることによって、現実の人間のインタラクションの予測不可能なダイナミクスに信頼性を維持したAIエージェントを構築するための扉を開く。われわれは、航空会社、小売、テレコム、テレヘルスの4つのドメインで$\tau$-Traiをオープンソース化した。

論文の概要: Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents

関連論文リスト