Fugu-MT 論文翻訳(概要): Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

論文の概要: Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

arxiv url: http://arxiv.org/abs/2604.20904v1
Date: Tue, 21 Apr 2026 19:16:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.081084
Title: Reinforcing privacy reasoning in LLMs via normative simulacra from fiction
Title（参考訳）: フィクションの規範的シミュラクラによるLCMのプライバシ推論の強化
Authors: Matt Franchi, Madiha Zahrah Choksi, Harold Triedman, Helen Nissenbaum,
Abstract要約: コンテキスト整合性(Contextual Integrity)は、コンテキスト関連規範内の情報の適切なフローとしてプライバシを定義する、原則化されたフレームワークを提供する。本稿では、フィクション小説から規範的シミュラクラを抽出し、それらを微調整LDMに使用することを提案する。異なる社会的文脈にまたがる5つのCI整合ベンチマークを評価した。
参考スコア（独自算出の注目度）: 1.143869785127334
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Information handling practices of LLM agents are broadly misaligned with the contextual privacy expectations of their users. Contextual Integrity (CI) provides a principled framework, defining privacy as the appropriate flow of information within context-relative norms. However, existing approaches either double inference cost via supervisor-assistant architectures, or fine-tune on narrow task-specific data. We propose extracting normative simulacra (structured representations of norms and information flows) from fiction novels and using them to fine-tune LLMs via supervised learning followed by GRPO reinforcement learning. Our composite reward function combines programmatic signals, including task clarity (subsuming schema validity, construct discrimination, and extraction confidence), structural completeness, internal consistency, and context identification, with an LLM judge that evaluates whether the model's privacy reasoning is grounded in the held-out normative universe of the source text. To mitigate overfitting, we introduce per-completion contrastive scoring: each completion is evaluated against both the correct normative universe and a randomly selected wrong one, teaching the model to condition on context rather than memorize source-specific norms. We evaluate on five CI-aligned benchmarks spanning distinct societal contexts and ablate the contributions of RL and normative grounding. Across seven models, SFT introduces a conservative prior toward restricting information flow, improving recognition of privacy-relevant situations but not the correctness of privacy judgments. GRPO with normative grounding achieves the highest score on a law compliance benchmark and strongest correlation with crowdsourced human privacy expectations, demonstrating that fiction-derived normative simulacra can teach contextual privacy reasoning that transfers to real-world domains.
Abstract（参考訳）: LLMエージェントの情報処理のプラクティスは、ユーザのコンテキスト的プライバシの期待と広く一致していない。コンテキスト整合性(CI)は、コンテキスト相対規範内の情報の適切なフローとしてプライバシを定義する、原則化されたフレームワークを提供する。しかし、既存のアプローチでは、スーパーバイザー・アシスタント・アーキテクチャによる推論コストが2倍になるか、タスク固有のデータに微調整を施すかのどちらかである。フィクション小説から規範的シミュラクラ(規範と情報の流れの構造化表現)を抽出し,それらを教師付き学習とGRPO強化学習によって微調整する。我々の複合報酬関数は,タスクの明確性(スキーマ妥当性,構成識別,抽出信頼度を仮定する),構造的完全性,内部整合性,コンテキスト識別などを含むプログラム的信号と,モデルのプライバシ推論がソーステキストの保持された規範的宇宙に根ざされているかどうかを評価するLCM判断とを結合する。オーバーフィッティングを緩和するために、各完了は正しい規範宇宙とランダムに選択された誤り宇宙の両方に対して評価され、ソース固有の規範を記憶するのではなく、文脈の条件にモデルを教える。我々は、異なる社会的文脈にまたがる5つのCI整合ベンチマークを評価し、RLと規範的基盤の貢献を裏付ける。 7つのモデルにまたがって、SFTは情報の流れを制限し、プライバシー関連状況の認識を改善しつつも、プライバシー判断の正しさを損なうという保守的な先例を導入している。規範的根拠を持つGRPOは、法コンプライアンスベンチマークで最高スコアを獲得し、クラウドソーシングされた人間のプライバシの期待と強い相関を達成し、フィクション由来の規範的シミュラクラが現実世界のドメインに転送するコンテキスト的プライバシ推論を教えることができることを示した。

論文の概要: Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

関連論文リスト