Fugu-MT 論文翻訳(概要): First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation

論文の概要: First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation

arxiv url: http://arxiv.org/abs/2511.04715v1
Date: Thu, 06 Nov 2025 00:47:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-10 21:00:44.54731
Title: First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
Title（参考訳）: First is not really better than last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
Authors: Dmytro Vitel, Anshuman Chhabra,
Abstract要約: モデル決定を効果的に解釈するためには、LLM(Large Language Model)決定に影響を及ぼす訓練サンプルが不可欠である。現在のトレーニングサンプル影響推定法(インフルエンス関数とも呼ばれる)は、モデルを通しての情報フローを利用することで、この目標を達成している。しかしながら、数十億のパラメータからなる今日の大規模なモデルサイズのため、これらの影響計算はモデル層の一部に制限されることが多い。
参考スコア（独自算出の注目度）: 8.788531432978802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying how training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions and auditing large-scale datasets. Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model via its first-order and higher-order gradient terms. However, owing to the large model sizes of today consisting of billions of parameters, these influence computations are often restricted to some subset of model layers to ensure computational feasibility. Prior seminal work by Yeh et al. (2022) in assessing which layers are best suited for computing language data influence concluded that the first (embedding) layers are the most informative for this purpose, using a hypothesis based on influence scores canceling out (i.e., the cancellation effect). In this work, we propose theoretical and empirical evidence demonstrating how the cancellation effect is unreliable, and that middle attention layers are better estimators for influence. Furthermore, we address the broader challenge of aggregating influence scores across layers, and showcase how alternatives to standard averaging (such as ranking and vote-based methods) can lead to significantly improved performance. Finally, we propose better methods for evaluating influence score efficacy in LLMs without undertaking model retraining, and propose a new metric known as the Noise Detection Rate (NDR) that exhibits strong predictive capability compared to the cancellation effect. Through extensive experiments across LLMs of varying types and scales, we concretely determine that the first (layers) are not necessarily better than the last (layers) for LLM influence estimation, contrasting with prior knowledge in the field.
Abstract（参考訳）: モデル決定を効果的に解釈し、大規模なデータセットを監査するために、トレーニングサンプルがどのようにLLM(Large Language Model)決定に影響を与えるかを特定することが不可欠である。現在のトレーニングサンプル影響推定法(インフルエンス関数とも呼ばれる)は、その1次および高次勾配項を介してモデルを通る情報フローを利用することで、この目標を達成している。しかし、今日の数十億のパラメータからなる大規模なモデルサイズのため、これらの影響計算は計算可能性を確保するためにモデル層のサブセットに制限されることが多い。 Yeh et al (2022) による先駆的な研究は、どの層が計算言語データの影響に最も適しているかを評価することで、最初の(埋め込み)層が、キャンセルされる影響スコア(つまりキャンセル効果)に基づいた仮説を用いて、この目的のために最も有益であると結論付けた。本研究では, キャンセル効果が信頼できないこと, ミドルアテンション層が影響評価に有効であることを示す理論的, 実証的な証拠を提案する。さらに,各層にまたがる影響スコアの集約という課題に対処し,標準平均化(ランキングや投票方式など)の代替によって,性能が大幅に向上することを示す。最後に、モデル再トレーニングを行なわずにLLMにおける影響スコアの有効性を評価するためのより良い手法を提案し、キャンセル効果と比較して強い予測能力を示すノイズ検出率(NDR)と呼ばれる新しい指標を提案する。様々な種類やスケールのLLMに対する広範な実験を通じて、第1層(層)がLLM影響推定の最終層(層)よりも必ずしも良いとは限らないことを具体的に決定する。

論文の概要: First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation

関連論文リスト