Gradient Sketches for Training Data Attribution and Studying the Loss
Landscape
- URL: http://arxiv.org/abs/2402.03994v1
- Date: Tue, 6 Feb 2024 13:47:12 GMT
- Title: Gradient Sketches for Training Data Attribution and Studying the Loss
Landscape
- Authors: Andrea Schioppa
- Abstract summary: sketches of gradients and Hessian vector products play an essential role in applications where one needs to store many such vectors.
Motivated by work on the intrinsic dimension of neural networks, we propose and study a design space for scalable sketching algorithms.
- Score: 1.3325600043256554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random projections or sketches of gradients and Hessian vector products play
an essential role in applications where one needs to store many such vectors
while retaining accurate information about their relative geometry. Two
important scenarios are training data attribution (tracing a model's behavior
to the training data), where one needs to store a gradient for each training
example, and the study of the spectrum of the Hessian (to analyze the training
dynamics), where one needs to store multiple Hessian vector products. While
sketches that use dense matrices are easy to implement, they are memory bound
and cannot be scaled to modern neural networks. Motivated by work on the
intrinsic dimension of neural networks, we propose and study a design space for
scalable sketching algorithms. We demonstrate the efficacy of our approach in
three applications: training data attribution, the analysis of the Hessian
spectrum and the computation of the intrinsic dimension when fine-tuning
pre-trained language models.
Related papers
- Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning [35.72926400167876]
This paper proposes a novel parameter-efficient transfer learning framework for vision recognition models in the black-box setting.
In experiments, our model demonstrates superior performance in a few-shot transfer learning setting across extensive visual recognition datasets.
arXiv Detail & Related papers (2024-08-15T05:35:52Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - SCorP: Statistics-Informed Dense Correspondence Prediction Directly from Unsegmented Medical Images [5.507868474642766]
We introduce SCorP, a novel framework capable of predicting surface-based correspondences directly from unsegmented images.
The proposed model streamlines the training and inference phases by removing the supervision for the correspondence prediction task.
arXiv Detail & Related papers (2024-04-27T17:56:58Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network.
We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z) - Gradient-Based Training and Pruning of Radial Basis Function Networks
with an Application in Materials Physics [0.24792948967354234]
We propose a gradient-based technique for training radial basis function networks with an efficient and scalable open-source implementation.
We derive novel closed-form optimization criteria for pruning the models for continuous as well as binary data.
arXiv Detail & Related papers (2020-04-06T11:32:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.