Related papers: Gradient Sketches for Training Data Attribution and Studying the Loss Landscape

Gradient Sketches for Training Data Attribution and Studying the Loss Landscape

URL: http://arxiv.org/abs/2402.03994v1
Date: Tue, 6 Feb 2024 13:47:12 GMT
Title: Gradient Sketches for Training Data Attribution and Studying the Loss Landscape
Authors: Andrea Schioppa
Abstract summary: sketches of gradients and Hessian vector products play an essential role in applications where one needs to store many such vectors. Motivated by work on the intrinsic dimension of neural networks, we propose and study a design space for scalable sketching algorithms.
Score: 1.3325600043256554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random projections or sketches of gradients and Hessian vector products play an essential role in applications where one needs to store many such vectors while retaining accurate information about their relative geometry. Two important scenarios are training data attribution (tracing a model's behavior to the training data), where one needs to store a gradient for each training example, and the study of the spectrum of the Hessian (to analyze the training dynamics), where one needs to store multiple Hessian vector products. While sketches that use dense matrices are easy to implement, they are memory bound and cannot be scaled to modern neural networks. Motivated by work on the intrinsic dimension of neural networks, we propose and study a design space for scalable sketching algorithms. We demonstrate the efficacy of our approach in three applications: training data attribution, the analysis of the Hessian spectrum and the computation of the intrinsic dimension when fine-tuning pre-trained language models.

Related papers

Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z)
Semi-Supervised Adversarial Recognition of Refined Window Structures for Inverse Procedural Fa\c{c}ade Modeling [17.62526990262815]
This paper proposes a semi-supervised adversarial recognition strategy embedded in inverse procedural modeling. A simple procedural engine is built inside an existing 3D modeling software, producing fine-grained window geometries. Experiments using publicly available faccade image datasets reveal that the proposed training strategy can obtain about 10% improvement in classification accuracy.
arXiv Detail & Related papers (2022-01-22T06:34:48Z)
Leveraging Unsupervised Image Registration for Discovery of Landmark Shape Descriptor [5.40076482533193]
This paper proposes a self-supervised deep learning approach for discovering landmarks from images that can directly be used as a shape descriptor for subsequent analysis. We use landmark-driven image registration as the primary task to force the neural network to discover landmarks that register the images well. The proposed method circumvents segmentation and preprocessing and directly produces a usable shape descriptor using just 2D or 3D images.
arXiv Detail & Related papers (2021-11-13T01:02:10Z)
Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes. Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z)
DeepSatData: Building large scale datasets of satellite images for training machine learning models [77.17638664503215]
This report presents design considerations for automatically generating satellite imagery datasets for training machine learning models. We discuss issues faced from the point of view of deep neural network training and evaluation.
arXiv Detail & Related papers (2021-04-28T15:13:12Z)
TSGCNet: Discriminative Geometric Feature Learning with Two-Stream GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes. We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z)
Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes. Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them. We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z)
A Visual Analytics Framework for Explaining and Diagnosing Transfer Learning Processes [42.57604833160855]
We present a visual analytics framework for the multi-level exploration of the transfer learning processes when training deep neural networks. Our framework establishes a multi-aspect design to explain how the learned knowledge from the existing model is transferred into the new learning task when training deep neural networks.
arXiv Detail & Related papers (2020-09-15T05:59:00Z)
A Short Review on Data Modelling for Vector Fields [5.51641435875237]
Machine learning methods have proven highly successful in dealing with a wide variety of data analysis and analytics tasks. The recent success of end-to-end modelling scheme using deep neural networks allows the extension to more sophisticated and structured practical data. This review article is dedicated to recent computational tools of vector fields, including vector data representations, predictive model of spatial data, as well as applications in computer vision, signal processing, and empirical sciences.
arXiv Detail & Related papers (2020-09-01T17:07:29Z)
Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network. We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.