Gradient Sketches for Training Data Attribution and Studying the Loss
Landscape
- URL: http://arxiv.org/abs/2402.03994v1
- Date: Tue, 6 Feb 2024 13:47:12 GMT
- Title: Gradient Sketches for Training Data Attribution and Studying the Loss
Landscape
- Authors: Andrea Schioppa
- Abstract summary: sketches of gradients and Hessian vector products play an essential role in applications where one needs to store many such vectors.
Motivated by work on the intrinsic dimension of neural networks, we propose and study a design space for scalable sketching algorithms.
- Score: 1.3325600043256554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random projections or sketches of gradients and Hessian vector products play
an essential role in applications where one needs to store many such vectors
while retaining accurate information about their relative geometry. Two
important scenarios are training data attribution (tracing a model's behavior
to the training data), where one needs to store a gradient for each training
example, and the study of the spectrum of the Hessian (to analyze the training
dynamics), where one needs to store multiple Hessian vector products. While
sketches that use dense matrices are easy to implement, they are memory bound
and cannot be scaled to modern neural networks. Motivated by work on the
intrinsic dimension of neural networks, we propose and study a design space for
scalable sketching algorithms. We demonstrate the efficacy of our approach in
three applications: training data attribution, the analysis of the Hessian
spectrum and the computation of the intrinsic dimension when fine-tuning
pre-trained language models.
Related papers
- Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Semi-Supervised Adversarial Recognition of Refined Window Structures for
Inverse Procedural Fa\c{c}ade Modeling [17.62526990262815]
This paper proposes a semi-supervised adversarial recognition strategy embedded in inverse procedural modeling.
A simple procedural engine is built inside an existing 3D modeling software, producing fine-grained window geometries.
Experiments using publicly available faccade image datasets reveal that the proposed training strategy can obtain about 10% improvement in classification accuracy.
arXiv Detail & Related papers (2022-01-22T06:34:48Z) - Leveraging Unsupervised Image Registration for Discovery of Landmark
Shape Descriptor [5.40076482533193]
This paper proposes a self-supervised deep learning approach for discovering landmarks from images that can directly be used as a shape descriptor for subsequent analysis.
We use landmark-driven image registration as the primary task to force the neural network to discover landmarks that register the images well.
The proposed method circumvents segmentation and preprocessing and directly produces a usable shape descriptor using just 2D or 3D images.
arXiv Detail & Related papers (2021-11-13T01:02:10Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - DeepSatData: Building large scale datasets of satellite images for
training machine learning models [77.17638664503215]
This report presents design considerations for automatically generating satellite imagery datasets for training machine learning models.
We discuss issues faced from the point of view of deep neural network training and evaluation.
arXiv Detail & Related papers (2021-04-28T15:13:12Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z) - A Visual Analytics Framework for Explaining and Diagnosing Transfer
Learning Processes [42.57604833160855]
We present a visual analytics framework for the multi-level exploration of the transfer learning processes when training deep neural networks.
Our framework establishes a multi-aspect design to explain how the learned knowledge from the existing model is transferred into the new learning task when training deep neural networks.
arXiv Detail & Related papers (2020-09-15T05:59:00Z) - A Short Review on Data Modelling for Vector Fields [5.51641435875237]
Machine learning methods have proven highly successful in dealing with a wide variety of data analysis and analytics tasks.
The recent success of end-to-end modelling scheme using deep neural networks allows the extension to more sophisticated and structured practical data.
This review article is dedicated to recent computational tools of vector fields, including vector data representations, predictive model of spatial data, as well as applications in computer vision, signal processing, and empirical sciences.
arXiv Detail & Related papers (2020-09-01T17:07:29Z) - Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network.
We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.