Fugu-MT 論文翻訳(概要): The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

論文の概要: The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

arxiv url: http://arxiv.org/abs/2605.07186v1
Date: Fri, 08 May 2026 03:26:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.774307
Title: The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval
Title（参考訳）: テキスト・アンカニー・バレー:LLM情報検索における非単調な性能劣化
Authors: Zekai Tong, Ruiyao Xu, Aryan Shrivastava, Chenhao Tan, Ari Holtzman,
Abstract要約: 単語境界の破損がLarge Language Modelベンチマークのターゲット情報の検出方法に与える影響について検討する。単語に空白文字を挿入して断片に分解することで、LLMの精度は挿入率の増加とともにU字型曲線に従う。
参考スコア（独自算出の注目度）: 21.243421703047037
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing Large Language Model (LLM) benchmarks primarily focus on syntactically correct inputs, leaving a significant gap in evaluation on imperfect text. In this work, we study how word-boundary corruption affects how LLMs detect targeted information. By inserting whitespace characters within words to break them into fragments, LLMs' detection accuracy follows a U-shaped curve with the increase in insertion rate. We refer to this curve as the Text Uncanny Valley. To explain such observation, we propose a mode transition hypothesis: LLMs operate in a word-level mode for near-normal text and a character-level mode for heavily fragmented text, with the valley marking the disordered transition where neither mode is effective. Four experiments and one analysis are consistent with this account: in-context learning fails to rescue valley-bottom performance; regularizing the perturbation substantially reduces the U-shape; a math reasoning task replicates the U-shape for Gemini 3.0 Flash but not for stronger models, suggesting the effect is attenuated when tasks rely less on exact lexical alignment; and tokenization entropy peaks before the F1 minimum, consistent with a regime-conflict interpretation. These findings reveal a failure mode invisible to clean-text benchmarks yet directly relevant to any deployment scenario involving noisy or uncurated text inputs.
Abstract（参考訳）: 既存のLarge Language Model (LLM)ベンチマークは、主に構文的に正しい入力に焦点を当てており、不完全なテキストに対する評価において大きなギャップを残している。本研究では,単語境界汚職がLLMのターゲット情報の検出方法に与える影響について検討する。単語に空白文字を挿入して断片に分解することで、LLMの精度は挿入率の増加とともにU字型曲線に従う。この曲線をText Uncanny Valleyと呼ぶ。このような観察を説明するために、我々はモード遷移仮説を提案する: LLMは、ほぼ正常なテキストのワードレベルモードと、大きな断片化されたテキストの文字レベルモードで動作し、バレーは、どちらのモードも有効でない混乱した遷移を示す。 4つの実験と1つの分析は、この説明と一致している: 文脈内学習はバレーボトムのパフォーマンスを救えない; 摂動の規則化はU字を著しく減少させる; 数学推論タスクはGemini 3.0 FlashのU字を複製するが、より強いモデルには適用されない。これらの結果から、クリーンテキストベンチマークでは見えない障害モードが、ノイズや未処理のテキスト入力を含むデプロイシナリオに直接関連していることが明らかになった。

論文の概要: The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

関連論文リスト