Training Efficiency and Robustness in Deep Learning
- URL: http://arxiv.org/abs/2112.01423v1
- Date: Thu, 2 Dec 2021 17:11:33 GMT
- Title: Training Efficiency and Robustness in Deep Learning
- Authors: Fartash Faghri
- Abstract summary: We study approaches to improve the training efficiency and robustness of deep learning models.
We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data.
We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
- Score: 2.6451769337566406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning has revolutionized machine learning and artificial
intelligence, achieving superhuman performance in several standard benchmarks.
It is well-known that deep learning models are inefficient to train; they learn
by processing millions of training data multiple times and require powerful
computational resources to process large batches of data in parallel at the
same time rather than sequentially. Deep learning models also have unexpected
failure modes; they can be fooled into misbehaviour, producing unexpectedly
incorrect predictions.
In this thesis, we study approaches to improve the training efficiency and
robustness of deep learning models. In the context of learning visual-semantic
embeddings, we find that prioritizing learning on more informative training
data increases convergence speed and improves generalization performance on
test data. We formalize a simple trick called hard negative mining as a
modification to the learning objective function with no computational overhead.
Next, we seek improvements to optimization speed in general-purpose
optimization methods in deep learning. We show that a redundancy-aware
modification to the sampling of training data improves the training speed and
develops an efficient method for detecting the diversity of training signal,
namely, gradient clustering. Finally, we study adversarial robustness in deep
learning and approaches to achieve maximal adversarial robustness without
training with additional data. For linear models, we prove guaranteed maximal
robustness achieved only by appropriate choice of the optimizer,
regularization, or architecture.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Accelerating Deep Learning with Fixed Time Budget [2.190627491782159]
This paper proposes an effective technique for training arbitrary deep learning models within fixed time constraints.
The proposed method is extensively evaluated in both classification and regression tasks in computer vision.
arXiv Detail & Related papers (2024-10-03T21:18:04Z) - Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI [17.242331892899543]
Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning.
Learning performance data tend to be highly sparse (80%(sim)90% missing observations) in most real-world applications due to adaptive item selection.
This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data.
arXiv Detail & Related papers (2024-09-24T00:25:07Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Friendly Training: Neural Networks Can Adapt Data To Make Learning
Easier [23.886422706697882]
We propose a novel training procedure named Friendly Training.
We show that Friendly Training yields improvements with respect to informed data sub-selection and random selection.
Results suggest that adapting the input data is a feasible way to stabilize learning and improve the skills generalization of the network.
arXiv Detail & Related papers (2021-06-21T10:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.