Debugging using Orthogonal Gradient Descent
- URL: http://arxiv.org/abs/2206.08489v1
- Date: Fri, 17 Jun 2022 00:03:54 GMT
- Title: Debugging using Orthogonal Gradient Descent
- Authors: Narsimha Chilkuri, Chris Eliasmith
- Abstract summary: Given a trained model that is partially faulty, can we correct its behaviour without having to train the model from scratch?
In other words, can we " neural networks similar to how we address bugs in our mathematical models and standard computer code?
- Score: 7.766921168069532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report we consider the following problem: Given a trained model that
is partially faulty, can we correct its behaviour without having to train the
model from scratch? In other words, can we ``debug" neural networks similar to
how we address bugs in our mathematical models and standard computer code. We
base our approach on the hypothesis that debugging can be treated as a two-task
continual learning problem. In particular, we employ a modified version of a
continual learning algorithm called Orthogonal Gradient Descent (OGD) to
demonstrate, via two simple experiments on the MNIST dataset, that we can
in-fact \textit{unlearn} the undesirable behaviour while retaining the general
performance of the model, and we can additionally \textit{relearn} the
appropriate behaviour, both without having to train the model from scratch.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Challenges and Pitfalls of Bayesian Unlearning [6.931200003384123]
Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model.
Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data.
Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data.
arXiv Detail & Related papers (2022-07-07T11:24:50Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Capturing the learning curves of generic features maps for realistic
data sets with a teacher-student model [24.679669970832396]
Teacher-student models provide a powerful framework in which the typical case performance of high-dimensional supervised learning tasks can be studied in closed form.
In this setting, labels are assigned to data - often taken to be Gaussian i.i.d. - by a teacher model, and the goal is to characterise the typical performance of the student model in recovering the parameters that generated the labels.
arXiv Detail & Related papers (2021-02-16T12:49:15Z) - Understanding the Failure Modes of Out-of-Distribution Generalization [35.00563456450452]
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time.
In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way em even in easy-to-learn tasks.
arXiv Detail & Related papers (2020-10-29T17:19:03Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Query Training: Learning a Worse Model to Infer Better Marginals in
Undirected Graphical Models with Hidden Variables [11.985433487639403]
Probabilistic graphical models (PGMs) provide a compact representation of knowledge that can be queried in a flexible way.
We introduce query training (QT), a mechanism to learn a PGM that is optimized for the approximate inference algorithm that will be paired with it.
We demonstrate experimentally that QT can be used to learn a challenging 8-connected grid Markov random field with hidden variables.
arXiv Detail & Related papers (2020-06-11T20:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.