Related papers: Pufferfish: Communication-efficient Models At No Extra Cost

Pufferfish: Communication-efficient Models At No Extra Cost

URL: http://arxiv.org/abs/2103.03936v1
Date: Fri, 5 Mar 2021 20:46:39 GMT
Title: Pufferfish: Communication-efficient Models At No Extra Cost
Authors: Hongyi Wang, Saurabh Agarwal, Dimitris Papailiopoulos
Abstract summary: Pufferfish is a communication and efficient distributed training framework. It incorporates the gradient compression into the model training process via training low-rank, pre-factorized deep networks. It achieves the same accuracy as state-of-the-art, off-the-shelf deep models.
Score: 7.408148824204065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To mitigate communication overheads in distributed model training, several studies propose the use of compressed stochastic gradients, usually achieved by sparsification or quantization. Such techniques achieve high compression ratios, but in many cases incur either significant computational overheads or some accuracy loss. In this work, we present Pufferfish, a communication and computation efficient distributed training framework that incorporates the gradient compression into the model training process via training low-rank, pre-factorized deep networks. Pufferfish not only reduces communication, but also completely bypasses any computation overheads related to compression, and achieves the same accuracy as state-of-the-art, off-the-shelf deep models. Pufferfish can be directly integrated into current deep learning frameworks with minimum implementation modification. Our extensive experiments over real distributed setups, across a variety of large-scale machine learning tasks, indicate that Pufferfish achieves up to 1.64x end-to-end speedup over the latest distributed training API in PyTorch without accuracy loss. Compared to the Lottery Ticket Hypothesis models, Pufferfish leads to equally accurate, small-parameter models while avoiding the burden of "winning the lottery". Pufferfish also leads to more accurate and smaller models than SOTA structured model pruning methods.

Related papers

PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning [0.0]
PacTrain is a novel framework that accelerates distributed training by combining pruning with sparse gradient compression.<n>We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the all-reduce primitive.
arXiv Detail & Related papers (2025-05-24T07:06:36Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning [41.299675080384]
Mixed-precision quantization (MPQ) is advocated to compress the model effectively by allocating heterogeneous bit-width for layers. MPQ is typically organized into a searching-retraining two-stage process. In this paper, we devise a one-shot training-searching paradigm for mixed-precision model compression.
arXiv Detail & Related papers (2024-01-03T05:26:57Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Cuttlefish: Low-Rank Model Training without All the Tuning [55.984294012024755]
We introduce Cuttlefish, an automated low-rank training approach. Cuttlefish switches from full-rank to low-rank training once the stable ranks of all layers have converged. Our results show that Cuttlefish generates models up to 5.6 times smaller than full-rank models, and attains up to a 1.2 times faster end-to-end training process.
arXiv Detail & Related papers (2023-05-04T04:20:20Z)
Distributed Pruning Towards Tiny Neural Networks in Federated Learning [12.63559789381064]
FedTiny is a distributed pruning framework for federated learning. It generates specialized tiny models for memory- and computing-constrained devices. It achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91%.
arXiv Detail & Related papers (2022-12-05T01:58:45Z)
Paoding: Supervised Robustness-preserving Data-free Neural Network Pruning [3.6953655494795776]
We study the neural network pruning in the emphdata-free context. We replace the traditional aggressive one-shot strategy with a conservative one that treats the pruning as a progressive process. Our method is implemented as a Python package named textscPaoding and evaluated with a series of experiments on diverse neural network models.
arXiv Detail & Related papers (2022-04-02T07:09:17Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training [74.43625662170284]
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. We propose a new compression technique that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability. We experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) without significant accuracy loss.
arXiv Detail & Related papers (2021-04-21T02:22:10Z)
A Partial Regularization Method for Network Compression [0.0]
We propose an approach of partial regularization rather than the original form of penalizing all parameters, which is said to be full regularization, to conduct model compression at a higher speed. Experimental results show that as we expected, the computational complexity is reduced by observing less running time in almost all situations. Surprisingly, it helps to improve some important metrics such as regression fitting results and classification accuracy in both training and test phases on multiple datasets.
arXiv Detail & Related papers (2020-09-03T00:38:27Z)
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers [94.43313684188819]
We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models.
arXiv Detail & Related papers (2020-02-26T21:17:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.