Related papers: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

URL: http://arxiv.org/abs/1912.13503v4
Date: Fri, 31 Jul 2020 00:44:06 GMT
Title: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
Authors: Jeffrey O Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik
Abstract summary: Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network. In this paper, we propose a straightforward alternative: side-tuning.
Score: 95.51368472949308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights. Adaptation can be useful in cases when training data is scarce, when a single learner needs to perform multiple tasks, or when one wishes to encode priors in the network. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative: side-tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network via summation. This simple method works as well as or better than existing solutions and it resolves some of the basic issues with fine-tuning, fixed features, and other common approaches. In particular, side-tuning is less prone to overfitting, is asymptotically consistent, and does not suffer from catastrophic forgetting in incremental learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including incremental learning (iCIFAR, iTaskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.

Related papers

Adaptive Depth Networks with Skippable Sub-Paths [1.8416014644193066]
We present a practical approach to adaptive depth networks with minimal training effort. Our approach does not train every target sub-network in an iterative manner. We provide a formal rationale for why the proposed training method can reduce overall prediction errors.
arXiv Detail & Related papers (2023-12-27T03:43:38Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z)
Increasing Depth of Neural Networks for Life-long Learning [2.0305676256390934]
We propose a novel method for continual learning based on the increasing depth of neural networks. This work explores whether extending neural network depth may be beneficial in a life-long learning setting.
arXiv Detail & Related papers (2022-02-22T11:21:41Z)
Robust Transfer Learning with Pretrained Language Models through Adapters [40.45102278979193]
Transfer learning with large pretrained language models like BERT has become a dominating approach for most NLP tasks. We propose a simple yet effective adapter-based approach to mitigate these issues. Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.
arXiv Detail & Related papers (2021-08-05T02:30:13Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Using UNet and PSPNet to explore the reusability principle of CNN parameters [5.623232537411766]
Reusability of parameters in each layer of a deep convolutional neural network is experimentally quantified. Running mean and running variance plays an important role than Weight and Bias in BN layer. The bias in Convolution layers are not sensitive, and it can be reused directly.
arXiv Detail & Related papers (2020-08-08T01:51:08Z)
Semantic Drift Compensation for Class-Incremental Learning [48.749630494026086]
Class-incremental learning of deep networks sequentially increases the number of classes to be classified. We propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars.
arXiv Detail & Related papers (2020-04-01T13:31:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.