Related papers: Contractive error feedback for gradient compression

Contractive error feedback for gradient compression

URL: http://arxiv.org/abs/2312.08538v1
Date: Wed, 13 Dec 2023 21:54:21 GMT
Title: Contractive error feedback for gradient compression
Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava and Georgios B. Giannakis
Abstract summary: We propose a communication efficient method called contractive error feedback (ConEF) As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation.
Score: 60.05809370598166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.

Related papers

Efficient Federated Fine-Tuning of Large Language Models with Layer Dropout [15.009864792277236]
Fine-tuning plays a crucial role in enabling pre-trained LLMs to evolve from general language comprehension to task-specific expertise. This work proposes DropPEFT, an innovative federated PEFT framework that employs a novel transformer dropout method. We show that DropPEFT can achieve a 1.3-6.3times speedup in model convergence and a 40%-67% reduction in memory footprint.
arXiv Detail & Related papers (2025-03-13T09:59:16Z)
FedProphet: Memory-Efficient Federated Adversarial Training via Theoretic-Robustness and Low-Inconsistency Cascade Learning [20.075335314952643]
Federated Learning (FL) provides a strong privacy guarantee by enabling local training across edge devices without training data sharing. FedProphet is a novel FAT framework that can achieve memory efficiency, adversarial robustness, and objective consistency simultaneously.
arXiv Detail & Related papers (2024-09-12T19:39:14Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices [36.23767349592602]
Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design. FL fails to work on low-memory AIoT devices due to its heavy memory usage. We propose a federated foresight pruning method based on Neural Tangent Kernel (NTK), which can seamlessly integrate with federated BP-Free training frameworks.
arXiv Detail & Related papers (2024-05-08T02:24:09Z)
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices [9.928745904761358]
Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data. For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential. We present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference.
arXiv Detail & Related papers (2024-03-19T09:22:50Z)
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory [69.33445217944029]
PETL is an effective strategy for adapting pre-trained models to downstream domains. Recent PETL works focus on the more valuable memory-efficient characteristic. We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
arXiv Detail & Related papers (2023-08-28T05:38:43Z)
CAME: Confidence-guided Adaptive Memory Efficient Optimization [20.009302737137787]
Adaptive gradient methods have demonstrated excellent performance in the training of large language models. The need for maintaining second-moment estimates requires maintaining a high cost of extra memory overheads. Several memory-efficients have been proposed to obtain a drastic reduction in auxiliary memory usage, but with a performance penalty. We propose CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods.
arXiv Detail & Related papers (2023-07-05T06:05:36Z)
Memory-adaptive Depth-wise Heterogenous Federated Learning [24.13198329419849]
We introduce a memory-adaptive depth-wise learning solution in FL called FeDepth, which adaptively decomposes the full model into blocks according to the memory budgets of each client. Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively.
arXiv Detail & Related papers (2023-03-08T20:52:57Z)
EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization [71.70414291057332]
TTA may primarily be conducted on edge devices with limited memory. Long-term adaptation often leads to catastrophic forgetting and error accumulation. We present lightweight meta networks that can adapt the frozen original networks to the target domain.
arXiv Detail & Related papers (2023-03-03T13:05:30Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled. This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.