First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
- URL: http://arxiv.org/abs/2511.04715v1
- Date: Thu, 06 Nov 2025 00:47:07 GMT
- Title: First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
- Authors: Dmytro Vitel, Anshuman Chhabra,
- Abstract summary: Training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions.<n>Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model.<n>However, owing to the large model sizes of today consisting of billions of parameters, these influence computations are often restricted to some subset of model layers.
- Score: 8.788531432978802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying how training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions and auditing large-scale datasets. Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model via its first-order and higher-order gradient terms. However, owing to the large model sizes of today consisting of billions of parameters, these influence computations are often restricted to some subset of model layers to ensure computational feasibility. Prior seminal work by Yeh et al. (2022) in assessing which layers are best suited for computing language data influence concluded that the first (embedding) layers are the most informative for this purpose, using a hypothesis based on influence scores canceling out (i.e., the cancellation effect). In this work, we propose theoretical and empirical evidence demonstrating how the cancellation effect is unreliable, and that middle attention layers are better estimators for influence. Furthermore, we address the broader challenge of aggregating influence scores across layers, and showcase how alternatives to standard averaging (such as ranking and vote-based methods) can lead to significantly improved performance. Finally, we propose better methods for evaluating influence score efficacy in LLMs without undertaking model retraining, and propose a new metric known as the Noise Detection Rate (NDR) that exhibits strong predictive capability compared to the cancellation effect. Through extensive experiments across LLMs of varying types and scales, we concretely determine that the first (layers) are not necessarily better than the last (layers) for LLM influence estimation, contrasting with prior knowledge in the field.
Related papers
- Influence-Preserving Proxies for Gradient-Based Data Selection in LLM Fine-tuning [51.87858735871145]
We introduce Iprox, a framework that derives influence-preserving proxies directly from the target model.<n>Iprox consistently outperforms off-the-shelf proxies and baseline methods.
arXiv Detail & Related papers (2026-02-19T20:57:30Z) - CausalPFN: Amortized Causal Effect Estimation via In-Context Learning [19.54034651361769]
CausalPFN infers causal effects for new observational datasets out of the box.<n>Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks.<n>CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles.
arXiv Detail & Related papers (2025-06-09T16:31:06Z) - Efficient Data Selection at Scale via Influence Distillation [53.03573620682107]
This paper introduces Influence Distillation, a mathematicallyjustified framework for data selection.<n>By distilling each sample's influence on a target distribution, our method assigns model-specific weights that are used to select training data.<n>Experiments show that Influence Distillation matches or outperforms state-of-the-art performance while achieving up to $3.5times$ faster selection.
arXiv Detail & Related papers (2025-05-25T09:08:00Z) - Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples.<n>We refine existing gradient-based methods to work effectively at scale.<n>We release our prompt set and model outputs, along with a web-based visualization tool to explore influential examples.
arXiv Detail & Related papers (2024-10-22T20:39:21Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Estimating Causal Effects with Double Machine Learning -- A Method Evaluation [5.904095466127043]
We review one of the most prominent methods - "double/debiased machine learning" (DML)
Our findings indicate that the application of a suitably flexible machine learning algorithm within DML improves the adjustment for various nonlinear confounding relationships.
When estimating the effects of air pollution on housing prices, we find that DML estimates are consistently larger than estimates of less flexible methods.
arXiv Detail & Related papers (2024-03-21T13:21:33Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Adapting and Evaluating Influence-Estimation Methods for
Gradient-Boosted Decision Trees [12.167833575680833]
Gradient-boosted decision trees (GBDTs) are a powerful and widely-used class of models.
We adapt influence-estimation methods designed for deep learning models to GBDTs.
We find BoostIn is an efficient influence-estimation method for GBDTs that performs equally well or better than existing work.
arXiv Detail & Related papers (2022-04-30T22:39:17Z) - First is Better Than Last for Language Data Influence [44.907420330002815]
We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer.
We also show that TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input.
arXiv Detail & Related papers (2022-02-24T00:48:29Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.