Systems for Parallel and Distributed Large-Model Deep Learning Training
- URL: http://arxiv.org/abs/2301.02691v1
- Date: Fri, 6 Jan 2023 19:17:29 GMT
- Title: Systems for Parallel and Distributed Large-Model Deep Learning Training
- Authors: Kabir Nagrecha
- Abstract summary: Some recent Transformer models span hundreds of billions of learnable parameters.
These designs have introduced new scale-driven systems challenges for the DL space.
This survey will explore the large-model training systems landscape, highlighting key challenges and the various techniques that have been used to address them.
- Score: 7.106986689736828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) has transformed applications in a variety of domains,
including computer vision, natural language processing, and tabular data
analysis. The search for improved DL model accuracy has led practitioners to
explore increasingly large neural architectures, with some recent Transformer
models spanning hundreds of billions of learnable parameters. These designs
have introduced new scale-driven systems challenges for the DL space, such as
memory bottlenecks, poor runtime efficiency, and high costs of model
development. Efforts to address these issues have explored techniques such as
parallelization of neural architectures, spilling data across the memory
hierarchy, and memory-efficient data representations. This survey will explore
the large-model training systems landscape, highlighting key challenges and the
various techniques that have been used to address them.
Related papers
- A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings [1.0589208420411014]
This survey explores the landscape of distributed learning, encompassing cloud and edge settings.
We delve into the core concepts of data and model parallelism, examining how models are partitioned across different dimensions and layers to optimize resource utilization and performance.
We analyze various partitioning schemes for different layer types, including fully connected, convolutional, and recurrent layers, highlighting the trade-offs between computational efficiency, communication overhead, and memory constraints.
arXiv Detail & Related papers (2024-05-23T22:00:38Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training.
It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z) - Deep Transfer Learning for Automatic Speech Recognition: Towards Better
Generalization [3.6393183544320236]
Speech recognition has become an important challenge when using deep learning (DL)
It requires large-scale training datasets and high computational and storage resources.
Deep transfer learning (DTL) has been introduced to overcome these issues.
arXiv Detail & Related papers (2023-04-27T21:08:05Z) - Neural Architecture Search for Dense Prediction Tasks in Computer Vision [74.9839082859151]
Deep learning has led to a rising demand for neural network architecture engineering.
neural architecture search (NAS) aims at automatically designing neural network architectures in a data-driven manner rather than manually.
NAS has become applicable to a much wider range of problems in computer vision.
arXiv Detail & Related papers (2022-02-15T08:06:50Z) - A Survey of Large-Scale Deep Learning Serving System Optimization:
Challenges and Opportunities [24.38071862662089]
Survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
Deep Learning (DL) models have achieved superior performance in many application domains, including vision, language, medical, commercial ads, entertainment, etc.
arXiv Detail & Related papers (2021-11-28T22:14:10Z) - Constructing Neural Network-Based Models for Simulating Dynamical
Systems [59.0861954179401]
Data-driven modeling is an alternative paradigm that seeks to learn an approximation of the dynamics of a system using observations of the true system.
This paper provides a survey of the different ways to construct models of dynamical systems using neural networks.
In addition to the basic overview, we review the related literature and outline the most significant challenges from numerical simulations that this modeling paradigm must overcome.
arXiv Detail & Related papers (2021-11-02T10:51:42Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.