On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review
- URL: http://arxiv.org/abs/2304.03589v1
- Date: Fri, 7 Apr 2023 11:13:23 GMT
- Title: On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review
- Authors: Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao
- Abstract summary: The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
- Score: 90.87691246153612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of deep learning has witnessed significant progress, particularly
in computer vision (CV), natural language processing (NLP), and speech. The use
of large-scale models trained on vast amounts of data holds immense promise for
practical applications, enhancing industrial productivity and facilitating
social development. With the increasing demands on computational capacity,
though numerous studies have explored the efficient training, a comprehensive
summarization on acceleration techniques of training deep learning models is
still much anticipated. In this survey, we present a detailed review for
training acceleration. We consider the fundamental update formulation and split
its basic components into five main perspectives: (1) data-centric: including
dataset regularization, data sampling, and data-centric curriculum learning
techniques, which can significantly reduce the computational complexity of the
data samples; (2) model-centric, including acceleration of basic modules,
compression training, model initialization and model-centric curriculum
learning techniques, which focus on accelerating the training via reducing the
calculations on parameters; (3) optimization-centric, including the selection
of learning rate, the employment of large batchsize, the designs of efficient
objectives, and model average techniques, which pay attention to the training
policy and improving the generality for the large-scale models; (4) budgeted
training, including some distinctive acceleration methods on source-constrained
situations; (5) system-centric, including some efficient open-source
distributed libraries/systems which provide adequate hardware support for the
implementation of acceleration algorithms. By presenting this comprehensive
taxonomy, our survey presents a comprehensive review to understand the general
mechanisms within each component and their joint interaction.
Related papers
- Accelerating Deep Learning with Fixed Time Budget [2.190627491782159]
This paper proposes an effective technique for training arbitrary deep learning models within fixed time constraints.
The proposed method is extensively evaluated in both classification and regression tasks in computer vision.
arXiv Detail & Related papers (2024-10-03T21:18:04Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models.
We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data.
We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z) - Privacy-Preserving Serverless Edge Learning with Decentralized Small
Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models.
This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z) - Efficient Deep Learning: A Survey on Making Deep Learning Models
Smaller, Faster, and Better [0.0]
With the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have increased significantly.
We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency.
We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support.
arXiv Detail & Related papers (2021-06-16T17:31:38Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.