AMP: Automatically Finding Model Parallel Strategies with Heterogeneity
Awareness
- URL: http://arxiv.org/abs/2210.07297v1
- Date: Thu, 13 Oct 2022 18:55:28 GMT
- Title: AMP: Automatically Finding Model Parallel Strategies with Heterogeneity
Awareness
- Authors: Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang
- Abstract summary: We develop AMP, a framework that automatically derives model-parallel execution strategies.
We evaluate AMP on popular models and cluster setups from public clouds.
AMP finds strategies with 1.54x and 1.77x higher throughput than state-of-the-art model-parallel systems.
- Score: 10.20441432750275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scaling up model sizes can lead to fundamentally new capabilities in many
machine learning (ML) tasks. However, training big models requires strong
distributed system expertise to carefully design model-parallel execution
strategies that suit the model architectures and cluster setups. In this paper,
we develop AMP, a framework that automatically derives such strategies. AMP
identifies a valid space of model parallelism strategies and efficiently
searches the space for high-performed strategies, by leveraging a cost model
designed to capture the heterogeneity of the model and cluster specifications.
Unlike existing methods, AMP is specifically tailored to support complex models
composed of uneven layers and cluster setups with more heterogeneous
accelerators and bandwidth. We evaluate AMP on popular models and cluster
setups from public clouds and show that AMP returns parallel strategies that
match the expert-tuned strategies on typical cluster setups. On heterogeneous
clusters or models with heterogeneous architectures, AMP finds strategies with
1.54x and 1.77x higher throughput than state-of-the-art model-parallel systems,
respectively.
Related papers
- Model Assembly Learning with Heterogeneous Layer Weight Merging [57.8462476398611]
We introduce Model Assembly Learning (MAL), a novel paradigm for model merging.
MAL integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities.
arXiv Detail & Related papers (2025-03-27T16:21:53Z) - Training-free Heterogeneous Model Merging [40.681362819808136]
We propose an innovative model merging framework designed for heterogeneous models.
We show that the merging of structurally heterogeneous models can achieve performance levels comparable to those of homogeneous merging.
Our code is publicly available at https://github.com/zju-vipa/training_free_heterogeneous_model_merging.
arXiv Detail & Related papers (2024-12-29T04:49:11Z) - Adaptive Learning of Design Strategies over Non-Hierarchical Multi-Fidelity Models via Policy Alignment [0.0]
Multi-fidelity Reinforcement Learning frameworks enhance the efficiency of engineering design by leveraging analysis models with varying levels of accuracy and computational costs.
This work proposes ALPHA, a novel multi-fidelity RL framework to efficiently learn a high-fidelity policy by adaptively leveraging an arbitrary set of non-hierarchical, heterogeneous, low-fidelity models alongside a high-fidelity model.
The effectiveness of ALPHA is demonstrated in analytical test optimization and octocopter design problems, utilizing two low-fidelity models alongside a high-fidelity one.
arXiv Detail & Related papers (2024-11-16T16:54:33Z) - Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems.
DTs often struggle to generalize to unseen conditions in data-scarce settings.
In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z) - Hierarchical Clustering for Conditional Diffusion in Image Generation [12.618079575423868]
This paper introduces TreeDiffusion, a deep generative model that conditions Diffusion Models on hierarchical clusters to obtain high-quality, cluster-specific generations.
The proposed pipeline consists of two steps: a VAE-based clustering model that learns the hierarchical structure of the data, and a conditional diffusion model that generates realistic images for each cluster.
arXiv Detail & Related papers (2024-10-22T11:35:36Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.
Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture.
Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training [42.514897110537596]
Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train.
designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging task.
We introduce COMET, a holistic cluster design methodology and workflow to jointly study the impact of parallelization strategies and key cluster resource provisioning on the performance of distributed DL training.
arXiv Detail & Related papers (2022-11-30T00:32:37Z) - AdaptDHM: Adaptive Distribution Hierarchical Model for Multi-Domain CTR
Prediction [4.299153274884263]
We propose an elegant and flexible multi-distribution modeling paradigm, named Adaptive Distribution Hierarchical Model (AdaptDHM)
Our model achieves impressive prediction accuracy and its time cost during the training stage is more than 50% less than that of other models.
arXiv Detail & Related papers (2022-11-22T09:10:37Z) - On Optimizing the Communication of Model Parallelism [74.15423270435949]
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL)
In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh.
We propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule.
arXiv Detail & Related papers (2022-11-10T03:56:48Z) - Decentralized Training of Foundation Models in Heterogeneous
Environments [77.47261769795992]
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive.
We present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network.
arXiv Detail & Related papers (2022-06-02T20:19:51Z) - DistIR: An Intermediate Representation and Simulator for Efficient
Neural Network Distribution [15.086401550425125]
DistIR is a representation for distributed computation that is tailored for efficient analyses.
We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
arXiv Detail & Related papers (2021-11-09T21:32:51Z) - Unsupervised multi-modal Styled Content Generation [61.040392094140245]
UMMGAN is a novel architecture designed to better model multi-modal distributions in an unsupervised fashion.
We show that UMMGAN effectively disentangles between modes and style, thereby providing an independent degree of control over the generated content.
arXiv Detail & Related papers (2020-01-10T19:36:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.