Fugu-MT 論文翻訳(概要): HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

論文の概要: HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

arxiv url: http://arxiv.org/abs/2605.19341v1
Date: Tue, 19 May 2026 04:29:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.118352
Title: HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models
Title（参考訳）: HalluWorld: 参照ワールドモデルによる幻覚のための制御されたベンチマーク
Authors: Emmy Liu, Varun Gangal, Michael Yu, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng,
Abstract要約: 幻覚は依然として大きな言語モデルの中心的な失敗モードである。既存のベンチマークでは、要約、質問応答、検索強化生成、エージェント間相互作用など、矛盾なく運用されている。明示的な参照ワールドの定式化を基礎としたベンチマークであるHaluWorldを紹介する。
参考スコア（独自算出の注目度）: 24.61808957290675
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting reduces hallucinations across contexts. Current benchmarks either require human annotation and fixed references that may be memorized, or rely on observations in settings that are difficult to reproduce. To study root causes, we introduce HalluWorld, an extensible benchmark grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this world. Building on this view, we construct synthetic and semi-synthetic environments in which the reference world is fully specified, the model's view is controlled, and hallucination labels are generated automatically. HalluWorld spans gridworlds, chess, and realistic terminal tasks, enabling controlled variation of world complexity, observability, temporal change, and source-conflict policy, and disentangling hallucinations into fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation remain difficult and are not generally solved by extended thinking. In the terminal setting, models also struggle with when to abstain. The uneven profile of failures across probe types and domains suggests that hallucinations arise from distinct failure modes rather than a single capability. Our results suggest that controlled reference worlds offer a scalable and reproducible path toward measuring and reducing hallucinations in modern language models.
Abstract（参考訳）: 幻覚は依然として大きな言語モデルの中心的な障害モードであるが、既存のベンチマークでは、要約、質問応答、検索強化生成、エージェント間相互作用などにおいて矛盾なく運用されている。この断片化は、ある設定で機能する緩和がコンテキスト間の幻覚を減少させるかどうかを不明確にする。現在のベンチマークでは、記憶されるかもしれない人間のアノテーションと固定された参照を必要とするか、再現が難しい設定での観察に依存している。根本原因を研究するために,本研究では,明示的な参照ワールドの定式化を基礎とした拡張可能なベンチマークであるHaluWorldを紹介した。このビューに基づいて、参照世界を完全に指定し、モデルのビューを制御し、幻覚ラベルを自動的に生成する合成・半合成環境を構築する。 HalluWorldは、グリッドワールド、チェス、現実的なターミナルタスクにまたがり、世界複雑性、可観測性、時間的変化、ソース・コンフリクトポリシーの制御されたバリエーションを可能にし、幻覚をきめ細かなエラーカテゴリに切り離す。我々は、フロンティアとオープンウェイト言語モデルを評価し、一貫したパターンを見出す: 直接観測された情報に対する知覚幻覚はフロンティアモデルではほぼ解決されるが、マルチステップ状態追跡と因果前処理は依然として困難であり、拡張思考では一般に解決されない。端末設定では、モデルはいつ停止するかに苦労する。プローブタイプとドメイン間の障害の均一なプロファイルは、幻覚は単一の機能ではなく、異なる障害モードから生じることを示唆している。この結果から,制御された参照世界は,現代言語モデルにおける幻覚の計測と低減に向けて,スケーラブルで再現可能な経路を提供する可能性が示唆された。

論文の概要: HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

関連論文リスト