Co-training $2^L$ Submodels for Visual Recognition
- URL: http://arxiv.org/abs/2212.04884v1
- Date: Fri, 9 Dec 2022 14:38:09 GMT
- Title: Co-training $2^L$ Submodels for Visual Recognition
- Authors: Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob
Verbeek, Herv\'e J\'egou
- Abstract summary: Submodel co-training is a regularization method related to co-training, self-distillation and depth.
We show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation.
- Score: 67.02999567435626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce submodel co-training, a regularization method related to
co-training, self-distillation and stochastic depth. Given a neural network to
be trained, for each sample we implicitly instantiate two altered networks,
``submodels'', with stochastic depth: we activate only a subset of the layers.
Each network serves as a soft teacher to the other, by providing a loss that
complements the regular loss provided by the one-hot label. Our approach,
dubbed cosub, uses a single set of weights, and does not involve a pre-trained
external model or temporal averaging.
Experimentally, we show that submodel co-training is effective to train
backbones for recognition tasks such as image classification and semantic
segmentation. Our approach is compatible with multiple architectures, including
RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their
results in comparable settings. For instance, a ViT-B pretrained with cosub on
ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z) - ResMLP: Feedforward networks for image classification with
data-efficient training [73.26364887378597]
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
We will share our code based on the Timm library and pre-trained models.
arXiv Detail & Related papers (2021-05-07T17:31:44Z) - Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z) - Background Splitting: Finding Rare Classes in a Sea of Background [55.03789745276442]
We focus on the real-world problem of training accurate deep models for image classification of a small number of rare categories.
In these scenarios, almost all images belong to the background category in the dataset (>95% of the dataset is background)
We demonstrate that both standard fine-tuning approaches and state-of-the-art approaches for training on imbalanced datasets do not produce accurate deep models in the presence of this extreme imbalance.
arXiv Detail & Related papers (2020-08-28T23:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.