Balanced Knowledge Distillation for Long-tailed Learning
- URL: http://arxiv.org/abs/2104.10510v1
- Date: Wed, 21 Apr 2021 13:07:35 GMT
- Title: Balanced Knowledge Distillation for Long-tailed Learning
- Authors: Shaoyu Zhang, Chen Chen, Xiyuan Hu, Silong Peng
- Abstract summary: Deep models trained on long-tailed datasets exhibit unsatisfactory performance on tail classes.
Existing methods usually modify the classification loss to increase the learning focus on tail classes.
We propose Balanced Knowledge Distillation to disentangle the contradiction between the two goals and achieve both simultaneously.
- Score: 9.732397447057318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep models trained on long-tailed datasets exhibit unsatisfactory
performance on tail classes. Existing methods usually modify the classification
loss to increase the learning focus on tail classes, which unexpectedly
sacrifice the performance on head classes. In fact, this scheme leads to a
contradiction between the two goals of long-tailed learning, i.e., learning
generalizable representations and facilitating learning for tail classes. In
this work, we explore knowledge distillation in long-tailed scenarios and
propose a novel distillation framework, named Balanced Knowledge Distillation
(BKD), to disentangle the contradiction between the two goals and achieve both
simultaneously. Specifically, given a vanilla teacher model, we train the
student model by minimizing the combination of an instance-balanced
classification loss and a class-balanced distillation loss. The former benefits
from the sample diversity and learns generalizable representation, while the
latter considers the class priors and facilitates learning mainly for tail
classes. The student model trained with BKD obtains significant performance
gain even compared with its teacher model. We conduct extensive experiments on
several long-tailed benchmark datasets and demonstrate that the proposed BKD is
an effective knowledge distillation framework in long-tailed scenarios, as well
as a new state-of-the-art method for long-tailed learning. Code is available at
https://github.com/EricZsy/BalancedKnowledgeDistillation .
Related papers
- A dual-branch model with inter- and intra-branch contrastive loss for
long-tailed recognition [7.225494453600985]
Models trained on long-tailed datasets have poor adaptability to tail classes and the decision boundaries are ambiguous.
We propose a simple yet effective model, named Dual-Branch Long-Tailed Recognition (DB-LTR), which includes an imbalanced learning branch and a Contrastive Learning Branch (CoLB)
CoLB can improve the capability of the model in adapting to tail classes and assist the imbalanced learning branch to learn a well-represented feature space and discriminative decision boundary.
arXiv Detail & Related papers (2023-09-28T03:31:11Z) - Class-aware Information for Logit-based Knowledge Distillation [16.634819319915923]
We propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level.
CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance.
arXiv Detail & Related papers (2022-11-27T09:27:50Z) - Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks.
Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design.
We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes.
Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z) - Long-tailed Recognition by Learning from Latent Categories [70.6272114218549]
We introduce a Latent Categories based long-tail Recognition (LCReg) method.
Specifically, we learn a set of class-agnostic latent features shared among the head and tail classes.
Then, we implicitly enrich the training sample diversity via applying semantic data augmentation to the latent features.
arXiv Detail & Related papers (2022-06-02T12:19:51Z) - Contrastive Learning with Boosted Memorization [36.957895270908324]
Self-supervised learning has achieved a great success in the representation learning of visual and textual data.
Recent attempts to consider self-supervised long-tailed learning are made by rebalancing in the loss perspective or the model perspective.
We propose a novel Boosted Contrastive Learning (BCL) method to enhance the long-tailed learning in the label-unaware context.
arXiv Detail & Related papers (2022-05-25T11:54:22Z) - Improving Tail-Class Representation with Centroid Contrastive Learning [145.73991900239017]
We propose interpolative centroid contrastive learning (ICCL) to improve long-tailed representation learning.
ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the ICCL can be used to retrieve the centroids for both source classes.
Our result shows a significant accuracy gain of 2.8% on the iNaturalist 2018 dataset with a real-world long-tailed distribution.
arXiv Detail & Related papers (2021-10-19T15:24:48Z) - Self Supervision to Distillation for Long-Tailed Visual Recognition [34.29744530188875]
We show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition.
Specifically, we propose a conceptually simple yet particularly effective multi-stage training scheme, termed as Self Supervised to Distillation (SSD)
Our method achieves the state-of-the-art results on three long-tailed recognition benchmarks: ImageNet-LT, CIFAR100-LT and iist 2018.
arXiv Detail & Related papers (2021-09-09T07:38:30Z) - Class-Balanced Distillation for Long-Tailed Visual Recognition [100.10293372607222]
Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions.
In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting.
Our main contribution is a new training method, that leverages knowledge distillation to enhance feature representations.
arXiv Detail & Related papers (2021-04-12T08:21:03Z) - Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [64.71102030006422]
We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE)
It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module.
RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks.
arXiv Detail & Related papers (2020-10-05T06:53:44Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.