Fugu-MT 論文翻訳(概要): READER: Robust Evidence-based Authorship Decoding via Extracted Representations

論文の概要: READER: Robust Evidence-based Authorship Decoding via Extracted Representations

arxiv url: http://arxiv.org/abs/2606.10794v2
Date: Wed, 10 Jun 2026 08:17:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 14:23:44.395387
Title: READER: Robust Evidence-based Authorship Decoding via Extracted Representations
Title（参考訳）: READER: 抽出表現によるロバストなエビデンスベースのオーサシップデコーディング
Authors: Jiaxu Liu, Sunnan Mu, Dong Huang, Liuyin Wang, Jing Shao, Jie Zhang,
Abstract要約: 我々は動的ブラックボックス LLM Provenance について検討し、クエリが変化し、未定義のプロンプトによって引き起こされる世代からソース LLM を同定する。本稿では,フリーズプロキシ LLM を隠されたオーサシップ証拠の読者として扱う軽量な証明フレームワーク READER を紹介する。 Agent500では、エージェントスタイルのプロンプトから構築された50ターゲットのデータセットが、単一のレスポンスから31.0$-42.4%$トップ1の精度で、50レスポンスから70.0$-84.0%$に達する。
参考スコア（独自算出の注目度）: 28.346447904547556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As agentic applications increasingly route user tasks through official and third-party LLM APIs, provenance becomes an operational question: which model generated a given black-box response? We study Dynamic Black-Box LLM Provenance: identifying the source LLM from generations elicited by query-varying, non-predefined prompts rather than a fixed input set or benchmark suite. This setting is difficult because prompt semantics dominate the text, while model-specific authorship traces are weak and inconsistent at the surface level. We introduce READER (Robust Evidence-based Authorship Decoding via Extracted Representations), a lightweight provenance framework that treats a frozen proxy LLM as a reader of hidden authorship evidence. READER maps black-box outputs into proxy activation space, temporally filters token states within each response, and performs Bayesian Evidence Accumulation by summing single-response log-posterior evidence across independently sampled prompts. This avoids fragile mean-pooling of prompt-specific representations while preserving the query-wise evidence needed for calibrated confidence. On Agent500, a 50-target dataset built from agent-style prompts, READER reaches $31.0$-$42.4\%$ top-1 accuracy from a single response and $70.0$-$84.0\%$ from 50 responses, substantially outperforming sentence-encoder fingerprints. Scaling across nine proxy readers further shows that stronger LLMs expose more linearly decodable authorship structure, suggesting that authorship perception is already present in frozen LLM representations and can be converted into reliable multi-query attribution.
Abstract（参考訳）: エージェントアプリケーションは、公式およびサードパーティのLDM APIを通じて、ユーザタスクをルーティングする傾向にあるため、プロファイランスは運用上の問題となる。我々は,動的ブラックボックス LLM Provenance について検討する: 固定された入力セットやベンチマークスイートではなく,クエリが変化し,未定義のプロンプトによって引き起こされる世代からソース LLM を識別する。この設定は、素早いセマンティクスがテキストを支配しているのに対して、モデル固有のオーサリングトレースは、表面レベルでは弱く一貫性がないため、難しい。本稿では,READER(Robust Evidence-based Authorship Decoding via Extracted Representations)について紹介する。 READERはブラックボックス出力をプロキシアクティベーション空間にマッピングし、各応答内のトークン状態を時間的にフィルタリングし、独立にサンプリングされたプロンプトにまたがって単一応答のログポストエビデンスをまとめてベイズ証拠蓄積を実行する。これにより、アクセプティブ固有の表現の脆弱な平均プールを回避すると同時に、キャリブレーションされた信頼性に必要なクエリワイドなエビデンスを保存することができる。 Agent500はエージェントスタイルのプロンプトから構築された50ターゲットのデータセットで、単一のレスポンスから31.0$-$42.4\%$トップ1の精度で、50レスポンスから70.0$-$84.0\%の精度で、文エンコーダの指紋よりもかなり優れている。 9つのプロキシリーダーにまたがるスケーリングにより、より強いLLMがより線形にデオード可能なオーサリング構造を露出していることが示され、オーサリングの認識がすでに凍結されたLLM表現に存在しており、信頼性の高いマルチクエリ属性に変換可能であることが示唆された。

論文の概要: READER: Robust Evidence-based Authorship Decoding via Extracted Representations

関連論文リスト