Related papers: Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

URL: http://arxiv.org/abs/2401.04105v2
Date: Sat, 30 Mar 2024 08:06:01 GMT
Title: Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Authors: Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem,
Abstract summary: We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption. Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
Score: 81.0108753452546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., video understanding, small object detection, and point cloud analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks, or Dr$^2$Net, a novel family of network architectures that acts as a surrogate network to finetune a pretrained model with substantially reduced memory consumption. Dr$^2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. Due to its reversibility, intermediate activations, which can be reconstructed from output, are cleared from memory during training. We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision. We evaluate Dr$^2$Net on various pretrained models and various tasks, and show that it can reach comparable performance to conventional finetuning but with significantly less memory usage.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z)
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z)
Analyzing and Improving the Training Dynamics of Diffusion Models [36.37845647984578]
We identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity.
arXiv Detail & Related papers (2023-12-05T11:55:47Z)
Residual Multi-Fidelity Neural Network Computing [0.0]
We present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function. We show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
arXiv Detail & Related papers (2023-10-05T14:43:16Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
Neural Network Training Using $\ell_1$-Regularization and Bi-fidelity Data [0.0]
We study the effects of sparsity promoting $ell_$-regularization on training neural networks when only a small training dataset from a high-fidelity model is available. We consider two variants of $ell_$-regularization informed by the parameters of an identical network trained using data from lower-fidelity models of the problem at hand. These bifidelity strategies are generalizations of transfer learning of neural networks that uses the parameters learned from a large low-fidelity dataset to efficiently train networks for a small high-fidelity dataset.
arXiv Detail & Related papers (2021-05-27T08:56:17Z)
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations [51.85020143716815]
We propose MutualNet to train a single network that can run at a diverse set of resource constraints. Our method trains a cohort of model configurations with various network widths and input resolutions. MutualNet is a general training methodology that can be applied to various network structures.
arXiv Detail & Related papers (2021-05-14T22:30:13Z)
Enabling Binary Neural Network Training on the Edge [7.32770338248516]
Existing binary neural network training methods require concurrent storage of high-precision activations for all layers. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78$times$ memory reduction.
arXiv Detail & Related papers (2021-02-08T15:06:41Z)
Continuous-in-Depth Neural Networks [107.47887213490134]
We first show that ResNets fail to be meaningful dynamical in this richer sense. We then demonstrate that neural network models can learn to represent continuous dynamical systems. We introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures.
arXiv Detail & Related papers (2020-08-05T22:54:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.