Related papers: Compression-Induced Communication-Efficient Large Model Training and Inferencing

Compression-Induced Communication-Efficient Large Model Training and Inferencing

URL: http://arxiv.org/abs/2508.00960v1
Date: Fri, 01 Aug 2025 12:51:40 GMT
Title: Compression-Induced Communication-Efficient Large Model Training and Inferencing
Authors: Sudip K. Seal, Maksudul Alam, Jorge Ramirez, Sajal Dash, Hao Lu,
Abstract summary: Energy efficiency of training and inferencing with large neural network models is a critical challenge.<n>This paper introduces an alternative strategy, called phantom parallelism, to minimize the net energy consumption.<n>Experiments are shown to deliver 50% reduction in the energy consumed to train FFNs using the proposed phantom parallel approach.
Score: 3.581934227767651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Energy efficiency of training and inferencing with large neural network models is a critical challenge facing the future of sustainable large-scale machine learning workloads. This paper introduces an alternative strategy, called phantom parallelism, to minimize the net energy consumption of traditional tensor (model) parallelism, the most energy-inefficient component of large neural network training. The approach is presented in the context of feed-forward network architectures as a preliminary, but comprehensive, proof-of-principle study of the proposed methodology. We derive new forward and backward propagation operators for phantom parallelism, implement them as custom autograd operations within an end-to-end phantom parallel training pipeline and compare its parallel performance and energy-efficiency against those of conventional tensor parallel training pipelines. Formal analyses that predict lower bandwidth and FLOP counts are presented with supporting empirical results on up to 256 GPUs that corroborate these gains. Experiments are shown to deliver ~50% reduction in the energy consumed to train FFNs using the proposed phantom parallel approach when compared with conventional tensor parallel methods. Additionally, the proposed approach is shown to train smaller phantom models to the same model loss on smaller GPU counts as larger tensor parallel models on larger GPU counts offering the possibility for even greater energy savings.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)<n>After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.<n>We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Forecasting the steam mass flow in a powerplant using the parallel hybrid network [0.0]
In this study, we use a parallel hybrid neural network architecture that combines a parametrized quantum circuit and a conventional feed-forward neural network. Our results show that the parallel hybrid model outperforms standalone classical and quantum models.
arXiv Detail & Related papers (2023-07-18T17:59:25Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Jet: Fast quantum circuit simulations with parallel task-based tensor-network contraction [0.8431877864777442]
We introduce a new open-source software library Jet, which uses task-based parallelism to obtain speed-ups in quantum circuits. These speed-ups result from i) the increased parallelism introduced by mapping the tensor-network simulation to a task-based framework, and ii) a novel method of reusing shared work between tensor-network tasks.
arXiv Detail & Related papers (2021-07-20T22:46:02Z)
Tensor networks for unsupervised machine learning [9.897828174118974]
We present the Autoregressive Matrix Product States (AMPS), a tensor-network-based model combining the matrix product states from quantum many-body physics and the autoregressive models from machine learning. We show that the proposed model significantly outperforms the existing tensor-network-based models and the restricted Boltzmann machines.
arXiv Detail & Related papers (2021-06-24T12:51:00Z)
Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.