Saturn: Efficient Multi-Large-Model Deep Learning
- URL: http://arxiv.org/abs/2311.02840v1
- Date: Mon, 6 Nov 2023 02:59:49 GMT
- Title: Saturn: Efficient Multi-Large-Model Deep Learning
- Authors: Kabir Nagrecha and Arun Kumar
- Abstract summary: We first identify three key interconnected systems challenges for users building large models.
We then formalize these as a joint problem, and build a new system architecture to tackle these challenges simultaneously.
Our evaluations show that our joint-optimization approach yields 39-49% lower model selection runtimes than typical current DL practice.
- Score: 6.377812618046872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose Saturn, a new data system to improve the efficiency
of multi-large-model training (e.g., during model selection/hyperparameter
optimization). We first identify three key interconnected systems challenges
for users building large models in this setting -- parallelism technique
selection, distribution of GPUs over jobs, and scheduling. We then formalize
these as a joint problem, and build a new system architecture to tackle these
challenges simultaneously. Our evaluations show that our joint-optimization
approach yields 39-49% lower model selection runtimes than typical current DL
practice.
Related papers
- Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models [28.993221775758702]
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability.
This paper marks a significant advance toward more flexible and comprehensive model merging techniques.
We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies.
arXiv Detail & Related papers (2024-09-27T16:31:31Z) - Applied Federated Model Personalisation in the Industrial Domain: A Comparative Study [5.999474111757664]
Three suggested strategies to tackle this challenge include Active Learning, Knowledge Distillation, and Local Memorization.
The present study delves into the fundamental principles of these three approaches and proposes an advanced Federated Learning System.
The results of the original and optimised models are then compared in both local and federated contexts using a comparison analysis.
arXiv Detail & Related papers (2024-09-10T23:00:19Z) - Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling [35.06717005729781]
Spindle is a new training system tailored for resource-efficient training of multi-task (MT) multi-modal (MM) models via wavefront scheduling.
Experiments demonstrate the superior performance and efficiency of Spindle, with speedup ratio up to 71% compared to state-of-the-art training systems.
arXiv Detail & Related papers (2024-09-05T09:10:40Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones.
This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - Saturn: An Optimized Data System for Large Model Deep Learning Workloads [6.377812618046872]
We tackle SPASE: Select a Parallelism, Allocate resources, and SchedulE.
We propose a new information system architecture to tackle the SPASE problem holistically.
We find that direct use of an MILP-solver is significantly more effective than several baselines.
arXiv Detail & Related papers (2023-09-03T17:19:11Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.