Saturn: An Optimized Data System for Large Model Deep Learning Workloads
- URL: http://arxiv.org/abs/2309.01226v2
- Date: Wed, 13 Dec 2023 18:42:58 GMT
- Title: Saturn: An Optimized Data System for Large Model Deep Learning Workloads
- Authors: Kabir Nagrecha and Arun Kumar
- Abstract summary: We tackle SPASE: Select a Parallelism, Allocate resources, and SchedulE.
We propose a new information system architecture to tackle the SPASE problem holistically.
We find that direct use of an MILP-solver is significantly more effective than several baselines.
- Score: 6.377812618046872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models such as GPT-3 & ChatGPT have transformed deep learning
(DL), powering applications that have captured the public's imagination. These
models are rapidly being adopted across domains for analytics on various
modalities, often by finetuning pre-trained base models. Such models need
multiple GPUs due to both their size and computational load, driving the
development of a bevy of "model parallelism" techniques & tools. Navigating
such parallelism choices, however, is a new burden for end users of DL such as
data scientists, domain scientists, etc. who may lack the necessary systems
knowhow. The need for model selection, which leads to many models to train due
to hyper-parameter tuning or layer-wise finetuning, compounds the situation
with two more burdens: resource apportioning and scheduling. In this work, we
tackle these three burdens for DL users in a unified manner by formalizing them
as a joint problem that we call SPASE: Select a Parallelism, Allocate
resources, and SchedulE. We propose a new information system architecture to
tackle the SPASE problem holistically, representing a key step toward enabling
wider adoption of large DL models. We devise an extensible template for
existing parallelism schemes and combine it with an automated empirical
profiler for runtime estimation. We then formulate SPASE as an MILP.
We find that direct use of an MILP-solver is significantly more effective
than several baseline heuristics. We optimize the system runtime further with
an introspective scheduling approach. We implement all these techniques into a
new data system we call Saturn. Experiments with benchmark DL workloads show
that Saturn achieves 39-49% lower model selection runtimes than typical current
DL practice.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Saturn: Efficient Multi-Large-Model Deep Learning [6.377812618046872]
We first identify three key interconnected systems challenges for users building large models.
We then formalize these as a joint problem, and build a new system architecture to tackle these challenges simultaneously.
Our evaluations show that our joint-optimization approach yields 39-49% lower model selection runtimes than typical current DL practice.
arXiv Detail & Related papers (2023-11-06T02:59:49Z) - Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training.
It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z) - AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
Learning Serving [53.01646445659089]
We show that model parallelism can be used for the statistical multiplexing of multiple devices when serving multiple models.
We present a novel serving system, AlpaServe, that determines an efficient strategy for placing and parallelizing collections of large deep learning models.
arXiv Detail & Related papers (2023-02-22T21:41:34Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Can Deep Learning be Applied to Model-Based Multi-Object Tracking? [25.464269324261636]
Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements.
Deep learning (DL) has been increasingly used in MOT for improving tracking performance.
In this paper, we propose a Transformer-based DL tracker and evaluate its performance in the model-based setting.
arXiv Detail & Related papers (2022-02-16T07:43:08Z) - Hydra: A System for Large Multi-Model Deep Learning [3.571623412954477]
We present'model spilling', a technique aimed at models such as Transformers and CNNs to move groups of layers between DRAM and GPU memory.
We then present a set of novel techniques leveraging spilling to raise efficiency for multi-model training workloads.
Experiments with real benchmark workloads show that HYDRA is over 7x faster than regular model parallelism and over 50% faster than state-of-the-art industrial tools for pipeline parallelism.
arXiv Detail & Related papers (2021-10-16T18:13:57Z) - Model-Parallel Model Selection for Deep Learning Systems [0.0]
inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users.
Many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices.
We propose a new form of "shard parallelism" combining task and model parallelism, then package it into a framework we name Hydra.
arXiv Detail & Related papers (2021-07-14T03:20:37Z) - It's the Best Only When It Fits You Most: Finding Related Models for
Serving Based on Dynamic Locality Sensitive Hashing [1.581913948762905]
Preparation of training data is often a bottleneck in the lifecycle of deploying a deep learning model for production or research.
This paper proposes an end-to-end process of searching related models for serving based on the similarity of the target dataset and the training datasets of the available models.
arXiv Detail & Related papers (2020-10-13T22:52:13Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.