Large-Scale Deep Learning Optimizations: A Comprehensive Survey
- URL: http://arxiv.org/abs/2111.00856v2
- Date: Tue, 2 Nov 2021 03:02:33 GMT
- Title: Large-Scale Deep Learning Optimizations: A Comprehensive Survey
- Authors: Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You
- Abstract summary: We aim to provide a sketch about the optimizations for large-scale deep learning with regard to the model accuracy and model efficiency.
We investigate algorithms that are most commonly used for optimizing, elaborate the debatable topic of generalization gap arises in large-batch training, and review the SOTA strategies in addressing the communication overhead and reducing the memory footprints.
- Score: 7.901786481399378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning have achieved promising results on a wide spectrum of AI
applications. Larger datasets and models consistently yield better performance.
However, we generally spend longer training time on more computation and
communication. In this survey, we aim to provide a clear sketch about the
optimizations for large-scale deep learning with regard to the model accuracy
and model efficiency. We investigate algorithms that are most commonly used for
optimizing, elaborate the debatable topic of generalization gap arises in
large-batch training, and review the SOTA strategies in addressing the
communication overhead and reducing the memory footprints.
Related papers
- Narrowing the Focus: Learned Optimizers for Pretrained Models [24.685918556547055]
We propose a novel technique that learns a layer-specific linear combination of update directions provided by a set of base work tasks.
When evaluated on an image, this specialized significantly outperforms both traditional off-the-shelf methods such as Adam, as well existing general learneds.
arXiv Detail & Related papers (2024-08-17T23:55:19Z) - Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control [1.1404490220482764]
BRO is a model-free algorithm to achieve near-optimal policies in the Dog and Humanoid tasks.
BRO achieves state-of-the-art results, significantly outperforming the leading model-based and model-free algorithms.
BRO is the first model-free algorithm to achieve near-optimal policies in the notoriously challenging Dog and Humanoid tasks.
arXiv Detail & Related papers (2024-05-25T09:53:25Z) - Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - Learning to Generalize Provably in Learning to Optimize [185.71326306329678]
Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizees by data-driven approaches.
Current L2O methods often suffer from poor generalization performance in at least two folds.
We propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework.
arXiv Detail & Related papers (2023-02-22T01:17:31Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Deep Learning Training Procedure Augmentations [0.0]
Recent advances in Deep Learning have greatly improved performance on various tasks such as object detection, image segmentation, sentiment analysis.
While this has lead to great results, many of which with real-world applications, other relevant aspects of deep learning have remained neglected and unknown.
We will present several novel deep learning training techniques which, while capable of offering significant performance gains, also reveal several interesting analysis results regarding convergence speed, optimization landscape, and adversarial robustness.
arXiv Detail & Related papers (2022-11-25T22:31:11Z) - Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models.
We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data.
We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.