Faithful and Fast Influence Function via Advanced Sampling
- URL: http://arxiv.org/abs/2510.26776v2
- Date: Fri, 31 Oct 2025 01:18:32 GMT
- Title: Faithful and Fast Influence Function via Advanced Sampling
- Authors: Jungyeon Koh, Hyeonsu Lyu, Jonggyu Jang, Hyun Jong Yang,
- Abstract summary: We propose two advanced sampling techniques based on features and logits.<n>These samplers select a small yet representative subset of the entire dataset by considering the distribution of features or logits.<n>We validate our approach through class removal experiments, using the F1-score to measure how effectively the model forgets a class.
- Score: 10.773767694482858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can we explain the influence of training data on black-box models? Influence functions (IFs) offer a post-hoc solution by utilizing gradients and Hessians. However, computing the Hessian for an entire dataset is resource-intensive, necessitating a feasible alternative. A common approach involves randomly sampling a small subset of the training data, but this method often results in highly inconsistent IF estimates due to the high variance in sample configurations. To address this, we propose two advanced sampling techniques based on features and logits. These samplers select a small yet representative subset of the entire dataset by considering the stochastic distribution of features or logits, thereby enhancing the accuracy of IF estimations. We validate our approach through class removal experiments, a typical application of IFs, using the F1-score to measure how effectively the model forgets the removed class while maintaining inference consistency on the remaining classes. Our method reduces computation time by 30.1% and memory usage by 42.2%, or improves the F1-score by 2.5% compared to the baseline.
Related papers
- Understanding Data Influence with Differential Approximation [63.817689230826595]
We introduce a new formulation to approximate a sample's influence by accumulating the differences in influence between consecutive learning steps, which we term Diff-In.<n>By employing second-order approximations, we approximate these difference terms with high accuracy while eliminating the need for model convexity required by existing methods.<n>Our theoretical analysis demonstrates that Diff-In achieves significantly lower approximation error compared to existing influence estimators.
arXiv Detail & Related papers (2025-08-20T11:59:32Z) - Rescaled Influence Functions: Accurate Data Attribution in High Dimension [8.392894051706055]
We present rescaled influence functions (RIF), a new tool for data attribution which can be used as a drop-in replacement for influence functions.<n>We compare IF and RIF on a range of real-world datasets, showing that RIFs offer significantly better predictions in practice.
arXiv Detail & Related papers (2025-06-07T04:19:21Z) - Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining [17.402771370806384]
Filter Like You Test (FLYT) is an algorithm for curating datasets that learns the usefulness of each data point as a pretraining example.<n>FLYT trains a scoring model that learns to weigh each example's features using gradient signals from downstream tasks training sets.<n>We implement Mixing-FLYT (M-FLYT), which takes the per-example scores generated by different scoring methods as features, and learns to unify them into a single score.
arXiv Detail & Related papers (2025-03-11T18:34:12Z) - Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [19.62665684173391]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels.<n>We introduce a martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data.<n>Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that leverages previously learned scores to improve accuracy at higher noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z) - Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo)
MoSo aims to identify and remove the least informative samples from the training set.
Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z) - Bayes Classification using an approximation to the Joint Probability
Distribution of the Attributes [1.0660480034605242]
We propose an approach that estimates conditional probabilities using information in the neighbourhood of the test sample.
We illustrate the performance of the proposed approach on a wide range of datasets taken from the University of California at Irvine (UCI) Machine Learning Repository.
arXiv Detail & Related papers (2022-05-29T22:24:02Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Towards Better Object Detection in Scale Variation with Adaptive Feature
Selection [3.5352273012717044]
We propose a novel adaptive feature selection module (AFSM) to automatically learn the way to fuse multi-level representations in the channel dimension.
It significantly improves the performance of the detectors that have a feature pyramid structure.
A class-aware sampling mechanism (CASM) is proposed to tackle the class imbalance problem.
arXiv Detail & Related papers (2020-12-06T13:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.