Self-Supervised Learning for Binary Networks by Joint Classifier
Training
- URL: http://arxiv.org/abs/2110.08851v1
- Date: Sun, 17 Oct 2021 15:38:39 GMT
- Title: Self-Supervised Learning for Binary Networks by Joint Classifier
Training
- Authors: Dahyun Kim, Jonghyun Choi
- Abstract summary: We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
- Score: 11.612308609123566
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the great success of self-supervised learning with large floating
point networks, such networks are not readily deployable to edge devices. To
accelerate deployment of models to edge devices for various downstream tasks by
unsupervised representation learning, we propose a self-supervised learning
method for binary networks. In particular, we propose to use a randomly
initialized classifier attached to a pretrained floating point feature
extractor as targets and jointly train it with a binary network. For better
training of the binary network, we propose a feature similarity loss, a dynamic
balancing scheme of loss terms, and modified multi-stage training. We call our
method as BSSL. Our empirical validations show that BSSL outperforms
self-supervised learning baselines for binary networks in various downstream
tasks and outperforms supervised pretraining in certain tasks.
Related papers
- Co-training $2^L$ Submodels for Visual Recognition [67.02999567435626]
Submodel co-training is a regularization method related to co-training, self-distillation and depth.
We show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation.
arXiv Detail & Related papers (2022-12-09T14:38:09Z) - On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet).
Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones.
We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z) - Learning Modular Structures That Generalize Out-of-Distribution [1.7034813545878589]
We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains.
Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network.
arXiv Detail & Related papers (2022-08-07T15:54:19Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - DSPNet: Towards Slimmable Pretrained Networks based on Discriminative
Self-supervised Learning [43.45674911425684]
We propose Discriminative-SSL-based Slimmable Pretrained Networks (DSPNet)
DSPNet can be trained at once and then slimmed to multiple sub-networks of various sizes.
We show comparable or improved performance of DSPNet on ImageNet to the networks individually pretrained.
arXiv Detail & Related papers (2022-07-13T09:32:54Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z) - Fully Convolutional Networks for Continuous Sign Language Recognition [83.85895472824221]
Continuous sign language recognition is a challenging task that requires learning on both spatial and temporal dimensions.
We propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences.
arXiv Detail & Related papers (2020-07-24T08:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.