Towards Effective Collaborative Learning in Long-Tailed Recognition
- URL: http://arxiv.org/abs/2305.03378v1
- Date: Fri, 5 May 2023 09:16:06 GMT
- Title: Towards Effective Collaborative Learning in Long-Tailed Recognition
- Authors: Zhengzhuo Xu and Zenghao Chai and Chengyin Xu and Chun Yuan and Haiqin
Yang
- Abstract summary: Real-world data usually suffers from severe class imbalance and long-tailed distributions, where minority classes are significantly underrepresented.
Recent research prefers to utilize multi-expert architectures to mitigate the model uncertainty on the minority.
In this paper, we observe that the knowledge transfer between experts is imbalanced in terms of class distribution, which results in limited performance improvement of the minority classes.
- Score: 16.202524991074416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world data usually suffers from severe class imbalance and long-tailed
distributions, where minority classes are significantly underrepresented
compared to the majority ones. Recent research prefers to utilize multi-expert
architectures to mitigate the model uncertainty on the minority, where
collaborative learning is employed to aggregate the knowledge of experts, i.e.,
online distillation. In this paper, we observe that the knowledge transfer
between experts is imbalanced in terms of class distribution, which results in
limited performance improvement of the minority classes. To address it, we
propose a re-weighted distillation loss by comparing two classifiers'
predictions, which are supervised by online distillation and label annotations,
respectively. We also emphasize that feature-level distillation will
significantly improve model performance and increase feature robustness.
Finally, we propose an Effective Collaborative Learning (ECL) framework that
integrates a contrastive proxy task branch to further improve feature quality.
Quantitative and qualitative experiments on four standard datasets demonstrate
that ECL achieves state-of-the-art performance and the detailed ablation
studies manifest the effectiveness of each component in ECL.
Related papers
- Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment [10.104085497265004]
We propose Ranking Loss based Knowledge Distillation (RLKD), which encourages consistency of peak predictions between the teacher and student models.
Our method enables the student model to better learn the multi-modal distributions of the teacher model, leading to a significant performance improvement in various downstream tasks.
arXiv Detail & Related papers (2024-09-19T08:06:42Z) - CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination [28.061239778773423]
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks.
CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption of computational resources.
We introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model.
arXiv Detail & Related papers (2024-08-18T11:23:21Z) - What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Revisiting Knowledge Distillation under Distribution Shift [7.796685962570969]
We study the mechanism of knowledge distillation against distribution shift.
We propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts.
We reveal intriguing observations of poor teaching performance under distribution shifts.
arXiv Detail & Related papers (2023-12-25T10:43:31Z) - Unbiased and Efficient Self-Supervised Incremental Contrastive Learning [31.763904668737304]
We propose a self-supervised Incremental Contrastive Learning (ICL) framework consisting of a novel Incremental InfoNCE (NCE-II) loss function.
ICL achieves up to 16.7x training speedup and 16.8x faster convergence with competitive results.
arXiv Detail & Related papers (2023-01-28T06:11:31Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Hybrid Discriminative-Generative Training via Contrastive Learning [96.56164427726203]
We show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.
We show our specific choice of approximation of the energy-based loss outperforms the existing practice in terms of classification accuracy of WideResNet on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2020-07-17T15:50:34Z) - Spectrum-Guided Adversarial Disparity Learning [52.293230153385124]
We propose a novel end-to-end knowledge directed adversarial learning framework.
It portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity.
The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art.
arXiv Detail & Related papers (2020-07-14T05:46:27Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.