Efficient ConvBN Blocks for Transfer Learning and Beyond
- URL: http://arxiv.org/abs/2305.11624v2
- Date: Wed, 28 Feb 2024 14:34:06 GMT
- Title: Efficient ConvBN Blocks for Transfer Learning and Beyond
- Authors: Kaichao You, Guo Qin, Anchang Bao, Meng Cao, Ping Huang, Jiulong Shan,
Mingsheng Long
- Abstract summary: A ConvBN block can operate in three modes: Train, Eval, and Deploy.
This paper focuses on the trade-off between stability and efficiency in ConvBN blocks.
We propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode.
- Score: 62.53078191019456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolution-BatchNorm (ConvBN) blocks are integral components in various
computer vision tasks and other domains. A ConvBN block can operate in three
modes: Train, Eval, and Deploy. While the Train mode is indispensable for
training models from scratch, the Eval mode is suitable for transfer learning
and beyond, and the Deploy mode is designed for the deployment of models. This
paper focuses on the trade-off between stability and efficiency in ConvBN
blocks: Deploy mode is efficient but suffers from training instability; Eval
mode is widely used in transfer learning but lacks efficiency. To solve the
dilemma, we theoretically reveal the reason behind the diminished training
stability observed in the Deploy mode. Subsequently, we propose a novel Tune
mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune
mode is as stable as Eval mode for transfer learning, and its computational
efficiency closely matches that of the Deploy mode. Through extensive
experiments in object detection, classification, and adversarial example
generation across $5$ datasets and $12$ model architectures, we demonstrate
that the proposed Tune mode retains the performance while significantly
reducing GPU memory footprint and training time, thereby contributing efficient
ConvBN blocks for transfer learning and beyond. Our method has been integrated
into both PyTorch (general machine learning framework) and MMCV/MMEngine
(computer vision framework). Practitioners just need one line of code to enjoy
our efficient ConvBN blocks thanks to PyTorch's builtin machine learning
compilers.
Related papers
- Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models [31.960749305728488]
We introduce a novel concept dubbed modular neural tangent kernel (mNTK)
We show that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $lambda_max$.
We propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $lambda_max$ exceeding a dynamic threshold.
arXiv Detail & Related papers (2024-05-13T07:46:48Z) - ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment [7.916080032572087]
atom is a resilient distributed training framework designed for asynchronous training of vast models in a decentralized setting.
atom aims to accommodate a complete LLM on one host (peer) through seamlessly model swapping and concurrently trains multiple copies across various peers to optimize training throughput.
Our experiments using different GPT-3 model configurations reveal that, in scenarios with suboptimal network connections, atom can enhance training efficiency up to $20 times$ when juxtaposed with the state-of-the-art decentralized pipeline parallelism approaches.
arXiv Detail & Related papers (2024-03-15T17:43:43Z) - Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise
Training of Neural Networks [9.718519843862937]
We introduce a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize sub-neural networks separately.
Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations.
arXiv Detail & Related papers (2023-12-20T08:02:33Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z) - Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines [7.960229223744695]
We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods.
This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence)
The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
arXiv Detail & Related papers (2020-01-15T21:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.