Estimating Training Data Influence by Tracing Gradient Descent
- URL: http://arxiv.org/abs/2002.08484v3
- Date: Sat, 14 Nov 2020 18:47:35 GMT
- Title: Estimating Training Data Influence by Tracing Gradient Descent
- Authors: Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale
- Abstract summary: TracIn computes the influence of a training example on a prediction made by the model.
TracIn is simple to implement; all it needs is the ability to work agnostic loss functions.
- Score: 21.94989239842377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a method called TracIn that computes the influence of a training
example on a prediction made by the model. The idea is to trace how the loss on
the test point changes during the training process whenever the training
example of interest was utilized. We provide a scalable implementation of
TracIn via: (a) a first-order gradient approximation to the exact computation,
(b) saved checkpoints of standard training procedures, and (c) cherry-picking
layers of a deep neural network. In contrast with previously proposed methods,
TracIn is simple to implement; all it needs is the ability to work with
gradients, checkpoints, and loss functions. The method is general. It applies
to any machine learning model trained using stochastic gradient descent or a
variant of it, agnostic of architecture, domain and task. We expect the method
to be widely useful within processes that study and improve training data.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation.
Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z) - An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks.
Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem.
We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z) - A Rate-Distortion View of Uncertainty Quantification [36.85921945174863]
In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction.
We introduce Distance Aware Bottleneck (DAB), a new method for enriching deep neural networks with this property.
arXiv Detail & Related papers (2024-06-16T01:33:22Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Transferring Learning Trajectories of Neural Networks [2.2299983745857896]
Training deep neural networks (DNNs) is computationally expensive.
We formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one.
We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training, and can be trained significantly faster than training from scratch.
arXiv Detail & Related papers (2023-05-23T14:46:32Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Graph Learning with Loss-Guided Training [16.815638149823744]
We explore em loss-guided training in a new domain of node embedding methods pioneered by sc DeepWalk.
Our empirical evaluation on a rich collection of datasets shows significant acceleration over the baseline static methods, both in terms of total training performed and overall computation.
arXiv Detail & Related papers (2020-05-31T08:03:06Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.