Self Supervision to Distillation for Long-Tailed Visual Recognition
- URL: http://arxiv.org/abs/2109.04075v1
- Date: Thu, 9 Sep 2021 07:38:30 GMT
- Title: Self Supervision to Distillation for Long-Tailed Visual Recognition
- Authors: Tianhao Li, Limin Wang, Gangshan Wu
- Abstract summary: We show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition.
Specifically, we propose a conceptually simple yet particularly effective multi-stage training scheme, termed as Self Supervised to Distillation (SSD)
Our method achieves the state-of-the-art results on three long-tailed recognition benchmarks: ImageNet-LT, CIFAR100-LT and iist 2018.
- Score: 34.29744530188875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has achieved remarkable progress for visual recognition on
large-scale balanced datasets but still performs poorly on real-world
long-tailed data. Previous methods often adopt class re-balanced training
strategies to effectively alleviate the imbalance issue, but might be a risk of
over-fitting tail classes. The recent decoupling method overcomes over-fitting
issues by using a multi-stage training scheme, yet, it is still incapable of
capturing tail class information in the feature learning stage. In this paper,
we show that soft label can serve as a powerful solution to incorporate label
correlation into a multi-stage training scheme for long-tailed recognition. The
intrinsic relation between classes embodied by soft labels turns out to be
helpful for long-tailed recognition by transferring knowledge from head to tail
classes.
Specifically, we propose a conceptually simple yet particularly effective
multi-stage training scheme, termed as Self Supervised to Distillation (SSD).
This scheme is composed of two parts. First, we introduce a self-distillation
framework for long-tailed recognition, which can mine the label relation
automatically. Second, we present a new distillation label generation module
guided by self-supervision. The distilled labels integrate information from
both label and data domains that can model long-tailed distribution
effectively. We conduct extensive experiments and our method achieves the
state-of-the-art results on three long-tailed recognition benchmarks:
ImageNet-LT, CIFAR100-LT and iNaturalist 2018. Our SSD outperforms the strong
LWS baseline by from $2.7\%$ to $4.5\%$ on various datasets. The code is
available at https://github.com/MCG-NJU/SSD-LT.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Long-Tailed Anomaly Detection with Learnable Class Names [64.79139468331807]
We introduce several datasets with different levels of class imbalance and metrics for performance evaluation.
We then propose a novel method, LTAD, to detect defects from multiple and long-tailed classes, without relying on dataset class names.
LTAD substantially outperforms the state-of-the-art methods for most forms of dataset imbalance.
arXiv Detail & Related papers (2024-03-29T15:26:44Z) - Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks.
Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design.
We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes.
Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z) - Contrastive Learning with Boosted Memorization [36.957895270908324]
Self-supervised learning has achieved a great success in the representation learning of visual and textual data.
Recent attempts to consider self-supervised long-tailed learning are made by rebalancing in the loss perspective or the model perspective.
We propose a novel Boosted Contrastive Learning (BCL) method to enhance the long-tailed learning in the label-unaware context.
arXiv Detail & Related papers (2022-05-25T11:54:22Z) - Balanced Knowledge Distillation for Long-tailed Learning [9.732397447057318]
Deep models trained on long-tailed datasets exhibit unsatisfactory performance on tail classes.
Existing methods usually modify the classification loss to increase the learning focus on tail classes.
We propose Balanced Knowledge Distillation to disentangle the contradiction between the two goals and achieve both simultaneously.
arXiv Detail & Related papers (2021-04-21T13:07:35Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - ResLT: Residual Learning for Long-tailed Recognition [64.19728932445523]
We propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space.
We design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively.
We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018.
arXiv Detail & Related papers (2021-01-26T08:43:50Z) - Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [64.71102030006422]
We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE)
It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module.
RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks.
arXiv Detail & Related papers (2020-10-05T06:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.