Contractive error feedback for gradient compression
- URL: http://arxiv.org/abs/2312.08538v1
- Date: Wed, 13 Dec 2023 21:54:21 GMT
- Title: Contractive error feedback for gradient compression
- Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava
and Georgios B. Giannakis
- Abstract summary: We propose a communication efficient method called contractive error feedback (ConEF)
As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage.
We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation.
- Score: 60.05809370598166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device memory concerns in distributed deep learning have become severe due
to (i) the growth of model size in multi-GPU training, and (ii) the wide
adoption of deep neural networks for federated learning on IoT devices which
have limited storage. In such settings, communication efficient optimization
methods are attractive alternatives, however they still struggle with memory
issues. To tackle these challenges, we propose an communication efficient
method called contractive error feedback (ConEF). As opposed to SGD with
error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the
sweet spot of convergence and memory usage, and achieves communication
efficiency by leveraging biased and all-reducable gradient compression. We
empirically validate ConEF on various learning tasks that include image
classification, language modeling, and machine translation and observe that
ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on
test performance, while also achieving 1.3x - 5x speedup of SGD. Through our
work, we also demonstrate the feasibility and convergence of ConEF to clear up
the theoretical barrier of integrating ConEF to popular memory efficient
frameworks such as ZeRO-3.
Related papers
- FedProphet: Memory-Efficient Federated Adversarial Training via Theoretic-Robustness and Low-Inconsistency Cascade Learning [20.075335314952643]
Federated Learning (FL) provides a strong privacy guarantee by enabling local training across edge devices without training data sharing.
FedProphet is a novel FAT framework that can achieve memory efficiency, adversarial robustness, and objective consistency simultaneously.
arXiv Detail & Related papers (2024-09-12T19:39:14Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices [36.23767349592602]
Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design.
FL fails to work on low-memory AIoT devices due to its heavy memory usage.
We propose a federated foresight pruning method based on Neural Tangent Kernel (NTK), which can seamlessly integrate with federated BP-Free training frameworks.
arXiv Detail & Related papers (2024-05-08T02:24:09Z) - Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices [9.928745904761358]
Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data.
For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential.
We present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference.
arXiv Detail & Related papers (2024-03-19T09:22:50Z) - UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
Parameter and Memory [69.33445217944029]
PETL is an effective strategy for adapting pre-trained models to downstream domains.
Recent PETL works focus on the more valuable memory-efficient characteristic.
We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
arXiv Detail & Related papers (2023-08-28T05:38:43Z) - CAME: Confidence-guided Adaptive Memory Efficient Optimization [20.009302737137787]
Adaptive gradient methods have demonstrated excellent performance in the training of large language models.
The need for maintaining second-moment estimates requires maintaining a high cost of extra memory overheads.
Several memory-efficients have been proposed to obtain a drastic reduction in auxiliary memory usage, but with a performance penalty.
We propose CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods.
arXiv Detail & Related papers (2023-07-05T06:05:36Z) - Memory-adaptive Depth-wise Heterogenous Federated Learning [24.13198329419849]
We introduce a memory-adaptive depth-wise learning solution in FL called FeDepth, which adaptively decomposes the full model into blocks according to the memory budgets of each client.
Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively.
arXiv Detail & Related papers (2023-03-08T20:52:57Z) - EcoTTA: Memory-Efficient Continual Test-time Adaptation via
Self-distilled Regularization [71.70414291057332]
TTA may primarily be conducted on edge devices with limited memory.
Long-term adaptation often leads to catastrophic forgetting and error accumulation.
We present lightweight meta networks that can adapt the frozen original networks to the target domain.
arXiv Detail & Related papers (2023-03-03T13:05:30Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled.
This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit.
We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.