Fugu-MT 論文翻訳(概要): CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

論文の概要: CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

arxiv url: http://arxiv.org/abs/2603.19297v1
Date: Wed, 11 Mar 2026 04:11:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 02:36:12.863359
Title: CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing
Title（参考訳）: CLaRE-ty Amid Chaos:LLM編集におけるリップル効果予測のための表現エンタングルメントの定量化
Authors: Manit Baser, Alperen Yildiz, Dinil Mon Divakaran, Mohan Gurusamy,
Abstract要約: 我々は、リップル効果の発生箇所を特定するための表現レベル技術であるCLaREを紹介する。 CLaREは、単一の中間層からのフォワードアクティベーションを使用して事実間の絡み合いを定量化し、コストのかかる後方通過を回避する。複数のモデルに対して、このコーパスの大規模絡み合いグラフを計算し、局所的な編集が表現空間を通してどのように伝播するかをキャプチャする。
参考スコア（独自算出の注目度）: 4.180400747723904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The static knowledge representations of large language models (LLMs) inevitably become outdated or incorrect over time. While model-editing techniques offer a promising solution by modifying a model's factual associations, they often produce unpredictable ripple effects, which are unintended behavioral changes that propagate even to the hidden space. In this work, we introduce CLaRE, a lightweight representation-level technique to identify where these ripple effects may occur. Unlike prior gradient-based methods, CLaRE quantifies entanglement between facts using forward activations from a single intermediate layer, avoiding costly backward passes. To enable systematic study, we prepare and analyse a corpus of 11,427 facts drawn from three existing datasets. Using CLaRE, we compute large-scale entanglement graphs of this corpus for multiple models, capturing how local edits propagate through representational space. These graphs enable stronger preservation sets for model editing, audit trails, efficient red-teaming, and scalable post-edit evaluation. In comparison to baselines, CLaRE achieves an average of 62.2% improvement in Spearman correlation with ripple effects while being $2.74\times$ faster, and using $2.85\times$ less peak GPU memory. Besides, CLaRE requires only a fraction of the storage needed by the baselines to compute and preserve fact representations. Our entanglement graphs and corpus are available at https://anonymous.4open.science/r/CLaRE-488E.
Abstract（参考訳）: 大きな言語モデル(LLM)の静的な知識表現は、必然的に時代遅れになるか、不正確なものになる。モデル編集技術は、モデルの実効関係を変更することによって有望な解決策を提供するが、しばしば予測不可能なリップル効果を生じさせ、それは隠れた空間に伝播する意図しない行動変化である。本稿では,これらのリップル効果の発生源を特定する軽量な表現レベル技術であるCLaREを紹介する。従来の勾配に基づく手法とは異なり、CLaREは単一の中間層からの前方アクティベーションを用いて事実間の絡み合いを定量化し、コストのかかる後方通過を回避する。体系的な研究を可能にするために,既存の3つのデータセットから抽出された11,427の事実のコーパスを作成し,分析する。 CLaREを用いて、複数のモデルに対して、このコーパスの大規模絡み合いグラフを計算し、局所的な編集が表現空間を通してどのように伝播するかをキャプチャする。これらのグラフは、モデル編集、監査パス、効率的なリピート、スケーラブルな後評価のためのより強力な保存セットを可能にする。ベースラインと比較して、CLaREは、スピアマンとリップル効果の相関を平均62.2%改善し、2.74\times$高速、2.85\times$低いピークGPUメモリを使用する。さらにCLaREは、ファクト表現の計算と保存のためにベースラインに必要なストレージのごく一部しか必要としない。我々の絡み合いグラフとコーパスはhttps://anonymous.4open.science/r/CLaRE-488Eで入手できる。

論文の概要: CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

関連論文リスト