Fugu-MT 論文翻訳(概要): DECOR: Auditing LLM Deception via Information Manipulation Theory

論文の概要: DECOR: Auditing LLM Deception via Information Manipulation Theory

arxiv url: http://arxiv.org/abs/2605.19270v1
Date: Tue, 19 May 2026 02:33:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.079456
Title: DECOR: Auditing LLM Deception via Information Manipulation Theory
Title（参考訳）: DECOR:情報操作理論によるLLM偽造の監査
Authors: Linyue Cai, Samuel Yeh, Jwala Dhamala, Rahul Gupta, Sharon Li,
Abstract要約: 大規模言語モデルは真理情報を微妙に操作することができる。既存のブラックボックス法は粗い粒度の判断に依存している。戦略偽装のきめ細かい監査のためのマルチエージェントフレームワークであるDECORを紹介する。
参考スコア（独自算出の注目度）: 13.836634897436419
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models can deceive by subtly manipulating truthful information -- omitting key facts, shifting focus, or obscuring meaning -- making such behavior difficult to detect. Existing black-box methods rely on coarse-grained judgments, offering limited interpretability and failing to pinpoint which facts were distorted and how. We introduce DECOR, a multi-agent framework grounded in Information Manipulation Theory for fine-grained auditing of strategic deception in LLM responses. DECOR decomposes input contexts into atomic informational units and scores each unit against the response across four dimensions of manipulation, producing interpretable manipulation profiles that are aggregated into a global deception index. We comprehensively evaluate DECOR on both single-turn and multi-turn deception detection benchmarks spanning real-world domains, and show that DECOR achieves state-of-the-art performance on both, outperforming competitive baselines. The framework generalizes across 15 frontier models, and ablation studies confirm the contribution of each key design component. Our findings demonstrate that fine-grained, theory-grounded auditing of information manipulation offers an effective and interpretable path for LLM deception detection.
Abstract（参考訳）: 大きな言語モデルは、真実の情報を微妙に操作して、重要な事実を省略したり、焦点をシフトしたり、意味を隠したりすることで、その振る舞いを検出するのが難しくなる。既存のブラックボックス法は粗大な判断に依存しており、限定的な解釈可能性を提供し、どの事実が歪んでいるか、どのようにして特定できない。我々は,情報操作理論に基づく多エージェントフレームワークであるDECORを導入し,LSM応答における戦略的偽装のきめ細かい監査を行う。 DECORは入力コンテキストを原子情報単位に分解し、操作の4次元にわたる応答に対して各ユニットをスコアし、グローバルな騙し指数に集約された解釈可能な操作プロファイルを生成する。実世界のドメインにまたがるシングルターンおよびマルチターンの偽装検出ベンチマークにおいて,DECORを総合的に評価し,DECORが両者の最先端性能を達成し,競争上の基盤となることを示す。このフレームワークは15のフロンティアモデルにまたがって一般化され、アブレーション研究は各設計要素の寄与を確認している。以上の結果から,情報操作の微粒で理論的な監査は,LLM偽造検出に有効かつ解釈可能な方法であることがわかった。

論文の概要: DECOR: Auditing LLM Deception via Information Manipulation Theory

関連論文リスト