Related papers: HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

URL: http://arxiv.org/abs/2410.05090v1
Date: Mon, 7 Oct 2024 14:42:45 GMT
Title: HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Authors: Xinyu Zhou, Simin Fan, Martin Jaggi,
Abstract summary: We propose HyperINF, an efficient and accurate influence function approximation method. We incorporate the generalized fisher information (GFIM) as a low-rank approximation of the Hessian matrix. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead.
Score: 37.62285675595782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed for influence function approximation have significantly reduced the computational overheads. However, they mostly suffer from inaccurate estimation due to the lack of strong convergence guarantees from the algorithm. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically Schulz's iterative algorithm. To deal with the computation-intensive matrix multiplication, we incorporate the generalized fisher information (GFIM) as a low-rank approximation of the Hessian matrix, which reduces the memory and computation overheads to constant costs independent of ranks on LoRA-tuned models. We first demonstrate the superior accuracy and stability of \method compared to other baselines through a synthetic convergence simulation for matrix inversion. We further validate the efficacy of \method through extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other baselines suffer from significant degradation. Our codebase is available at https://github.com/Blackzxy/HyperINF.

Related papers

Preconditioned Additive Gaussian Processes with Fourier Acceleration [2.292881746604941]
We introduce a matrix-free method to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives. To address high-dimensional problems, we propose an additive kernel approach. Each sub- Kernel captures lower-order feature interactions, allowing for the efficient application of the NFFT method.
arXiv Detail & Related papers (2025-04-01T07:14:06Z)
Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data [9.913418444556486]
We show how iterative methods can be used to reduce the computational costs for calculating likelihoods, gradients, and predictive distributions with FSAs. We also present a novel, accurate, and fast way to calculate predictive variances relying on estimations and iterative methods. All methods are implemented in a free C++ software library with high-level Python and R packages.
arXiv Detail & Related papers (2024-05-23T12:25:22Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models [31.65198592956842]
We propose DataInf, an efficient influence approximation method that is practical for large-scale generative AI models. Our theoretical analysis shows that DataInf is particularly well-suited for parameter-efficient fine-tuning techniques such as LoRA. In applications to RoBERTa-large, Llama-2-13B-chat, and stable-diffusion-v1.5 models, DataInf effectively identifies the most influential fine-tuning examples better than other approximate influence scores.
arXiv Detail & Related papers (2023-10-02T04:59:19Z)
Rigorous dynamical mean field theory for stochastic gradient descent methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods. This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z)
Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem [7.132368785057316]
I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionary-based Matrix Factorization and Canonical Polyadic Decomposition. I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionary-based Matrix Factorization and Canonical Polyadic Decomposition in the context of hyperspectral image processing and chemometrics.
arXiv Detail & Related papers (2021-11-24T10:32:48Z)
Genetic CFL: Optimization of Hyper-Parameters in Clustered Federated Learning [4.710427287359642]
Federated learning (FL) is a distributed model for deep learning that integrates client-server architecture, edge computing, and real-time intelligence. FL has the capability of revolutionizing machine learning (ML) but lacks in the practicality of implementation due to technological limitations, communication overhead, non-IID (independent and identically distributed) data, and privacy concerns. We propose a novel hybrid algorithm, namely genetic clustered FL (Genetic CFL), that clusters edge devices based on the training hyper- parameters and genetically modifies the parameters cluster-wise.
arXiv Detail & Related papers (2021-07-15T10:16:05Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Solving weakly supervised regression problem using low-rank manifold regularization [77.34726150561087]
We solve a weakly supervised regression problem. Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources. In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Hypergraph Optimization for Multi-structural Geometric Model Fitting [42.217640009137554]
We propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations.
arXiv Detail & Related papers (2020-02-13T05:07:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.