Efficient Sketches for Training Data Attribution and Studying the Loss Landscape
- URL: http://arxiv.org/abs/2402.03994v2
- Date: Wed, 23 Oct 2024 13:44:08 GMT
- Title: Efficient Sketches for Training Data Attribution and Studying the Loss Landscape
- Authors: Andrea Schioppa,
- Abstract summary: We present a novel framework for scalable gradient and HVP sketching, tailored for modern hardware.
Our work sheds new light on the behavior of pre-trained language models, challenging assumptions about their intrinsic dimensionality and Hessian properties.
- Score: 1.3325600043256554
- License:
- Abstract: The study of modern machine learning models often necessitates storing vast quantities of gradients or Hessian vector products (HVPs). Traditional sketching methods struggle to scale under these memory constraints. We present a novel framework for scalable gradient and HVP sketching, tailored for modern hardware. We provide theoretical guarantees and demonstrate the power of our methods in applications like training data attribution, Hessian spectrum analysis, and intrinsic dimension computation for pre-trained language models. Our work sheds new light on the behavior of pre-trained language models, challenging assumptions about their intrinsic dimensionality and Hessian properties.
Related papers
- Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning [35.72926400167876]
This paper proposes a novel parameter-efficient transfer learning framework for vision recognition models in the black-box setting.
In experiments, our model demonstrates superior performance in a few-shot transfer learning setting across extensive visual recognition datasets.
arXiv Detail & Related papers (2024-08-15T05:35:52Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - SCorP: Statistics-Informed Dense Correspondence Prediction Directly from Unsegmented Medical Images [5.507868474642766]
We introduce SCorP, a novel framework capable of predicting surface-based correspondences directly from unsegmented images.
The proposed model streamlines the training and inference phases by removing the supervision for the correspondence prediction task.
arXiv Detail & Related papers (2024-04-27T17:56:58Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network.
We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z) - Gradient-Based Training and Pruning of Radial Basis Function Networks
with an Application in Materials Physics [0.24792948967354234]
We propose a gradient-based technique for training radial basis function networks with an efficient and scalable open-source implementation.
We derive novel closed-form optimization criteria for pruning the models for continuous as well as binary data.
arXiv Detail & Related papers (2020-04-06T11:32:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.