Better Training Data Attribution via Better Inverse Hessian-Vector Products
- URL: http://arxiv.org/abs/2507.14740v1
- Date: Sat, 19 Jul 2025 20:18:51 GMT
- Title: Better Training Data Attribution via Better Inverse Hessian-Vector Products
- Authors: Andrew Wang, Elisa Nguyen, Runshi Yang, Juhan Bae, Sheila A. McIlraith, Roger Grosse,
- Abstract summary: Training data attribution (TDA) provides insights into which training data is responsible for a learned model behavior.<n>We introduce an algorithm which uses the EKFAC-preconditioner on Neumann series iterations to arrive at an accurate iHVP approximation for TDA.<n>ASTRA is easy to tune, requires fewer iterations than Neumann series iterations, and is more accurate than EKFAC-based approximations.
- Score: 14.661332908535078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training data attribution (TDA) provides insights into which training data is responsible for a learned model behavior. Gradient-based TDA methods such as influence functions and unrolled differentiation both involve a computation that resembles an inverse Hessian-vector product (iHVP), which is difficult to approximate efficiently. We introduce an algorithm (ASTRA) which uses the EKFAC-preconditioner on Neumann series iterations to arrive at an accurate iHVP approximation for TDA. ASTRA is easy to tune, requires fewer iterations than Neumann series iterations, and is more accurate than EKFAC-based approximations. Using ASTRA, we show that improving the accuracy of the iHVP approximation can significantly improve TDA performance.
Related papers
- Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z) - Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z) - Inverse Evolution Data Augmentation for Neural PDE Solvers [10.111901389604423]
Training neural operators requires a large amount of training data to ensure accuracy and generalization.<n>We propose a novel data augmentation method specifically designed for training neural operators on evolution equations.
arXiv Detail & Related papers (2025-01-24T16:20:11Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.<n>We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection [7.887782360541216]
We introduce Decomposition (TD) techniques into unsupervised feature selection (UFS)<n>We use the orientation-dependent tensor-tensor product from sparse Singular Value Decomposition to solve the problem.<n>The proposed tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components.
arXiv Detail & Related papers (2024-07-24T04:04:56Z) - Training Data Attribution via Approximate Unrolled Differentiation [8.87519936904341]
Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification.
We introduce Source, an approximate unrolling-based TDA method that is computed using an influence-function-like formula.
arXiv Detail & Related papers (2024-05-20T17:17:44Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning
Method [49.93717224277131]
We propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period.
We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity to bes.
arXiv Detail & Related papers (2021-10-13T17:40:12Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z) - Batch Sequential Adaptive Designs for Global Optimization [5.825138898746968]
Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems.
For those multiple points EGO methods, the heavy computation and points clustering are the obstacles.
In this work, a novel batch SAD method, named "accelerated EGO", is forwarded by using a refined sampling/importance resampling (SIR) method.
The efficiency of the proposed SAD is validated by nine classic test functions with dimension from 2 to 12.
arXiv Detail & Related papers (2020-10-21T01:11:35Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.