Related papers: Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization

Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization

URL: http://arxiv.org/abs/2506.02787v1
Date: Tue, 03 Jun 2025 12:14:17 GMT
Title: Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
Authors: Ruilong Wu, Xinjiao Li, Yisu Wang, Xinyu Chen, Dirk Kutscher,
Abstract summary: Current automatic parallel planning frameworks overlook the simultaneous consideration of node heterogeneity and dynamic network topology changes.<n>We introduce a strategy pruning technique to rapidly discard infeasible parallel configurations.<n>Preliminary evaluations confirm that our method notably enhances training performance on heterogeneous nodes.
Score: 8.918295350787465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hybrid parallelism techniques are essential for efficiently training large language models (LLMs). Nevertheless, current automatic parallel planning frameworks often overlook the simultaneous consideration of node heterogeneity and dynamic network topology changes, limiting their effectiveness in practical applications. In this paper, we address these limitations by modeling heterogeneous nodes within dynamically changing network environments and leveraging simulation-based strategies to determine optimal parallel configurations. Our approach enables fine-grained workload allocation tailored for heterogeneous nodes and complex network scenarios, achieving performance competitive with state-of-the-art methods under regular and stable network conditions. Additionally, we introduce a strategy pruning technique to rapidly discard infeasible parallel configurations, substantially reducing the search space and accelerating the search process through parallel execution within the simulator. Preliminary evaluations confirm that our method notably enhances training performance on heterogeneous nodes and demonstrates improved adaptability in complex, dynamic scenarios such as cloud computing environments.

Related papers

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems [6.453224262551299]
Large language models (LLMs) in recommendation systems have become increasingly prominent.<n>This paper systematically investigates two classes of optimization methods-model parallelism and data parallelism.<n> Experiments conducted on a real-world recommendation dataset in a simulated service environment demonstrate that our proposed hybrid parallelism scheme increases training throughput by over 30%.
arXiv Detail & Related papers (2025-06-21T02:37:25Z)
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z)
Ray-Tracing for Conditionally Activated Neural Networks [4.9844734080376725]
We introduce a novel architecture for conditionally activated neural networks with a sampling mechanism that converges to an optimized configuration of expert activation.<n> Experimental results demonstrate that this approach achieves competitive accuracy compared to conventional baselines.
arXiv Detail & Related papers (2025-02-20T18:09:03Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Improving Automatic Parallel Training via Balanced Memory Workload Optimization [36.87527680184956]
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains. We present Galvatron-BMW, a novel system framework that integrates multiple parallelism prevalent dimensions and automatically identifies the most efficient hybrid parallelism strategy. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints.
arXiv Detail & Related papers (2023-07-05T05:28:38Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution [15.086401550425125]
DistIR is a representation for distributed computation that is tailored for efficient analyses. We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
arXiv Detail & Related papers (2021-11-09T21:32:51Z)
Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs) It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously. This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation. We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z)
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference [15.720414948573753]
We consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation.
arXiv Detail & Related papers (2020-08-19T06:44:41Z)
Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies. We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.