Rethink Long-tailed Recognition with Vision Transformers
- URL: http://arxiv.org/abs/2302.14284v2
- Date: Mon, 17 Apr 2023 08:35:02 GMT
- Title: Rethink Long-tailed Recognition with Vision Transformers
- Authors: Zhengzhuo Xu, Shuo Yang, Xingjun Wang, Chun Yuan
- Abstract summary: Vision Transformers (ViT) are hard to train with long-tailed data.
ViT learns generalized features in an unsupervised manner.
Predictive Distribution (PDC) is a novel metric for Long-Tailed Recognition.
- Score: 18.73285611631722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the real world, data tends to follow long-tailed distributions w.r.t.
class or attribution, motivating the challenging Long-Tailed Recognition (LTR)
problem. In this paper, we revisit recent LTR methods with promising Vision
Transformers (ViT). We figure out that 1) ViT is hard to train with long-tailed
data. 2) ViT learns generalized features in an unsupervised manner, like mask
generative training, either on long-tailed or balanced datasets. Hence, we
propose to adopt unsupervised learning to utilize long-tailed data.
Furthermore, we propose the Predictive Distribution Calibration (PDC) as a
novel metric for LTR, where the model tends to simply classify inputs into
common classes. Our PDC can measure the model calibration of predictive
preferences quantitatively. On this basis, we find many LTR approaches
alleviate it slightly, despite the accuracy improvement. Extensive experiments
on benchmark datasets validate that PDC reflects the model's predictive
preference precisely, which is consistent with the visualization.
Related papers
- Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning [55.384428765798496]
New data exhibits a long-tailed distribution, such as e-commerce platform reviews.
This necessitates continuous model learning imbalanced data without forgetting.
We introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL.
arXiv Detail & Related papers (2024-09-11T17:52:00Z) - Long-term Pre-training for Temporal Action Detection with Transformers [21.164101507575186]
Temporal action detection (TAD) is challenging, yet fundamental for real-world video applications.
In this paper, we identify two crucial problems from data scarcity: attention collapse and imbalanced performance.
We propose a new pre-training strategy, Long-Term Pre-training, tailored for transformers.
arXiv Detail & Related papers (2024-08-23T15:20:53Z) - REP: Resource-Efficient Prompting for On-device Continual Learning [23.92661395403251]
On-device continual learning (CL) requires the co-optimization of model accuracy and resource efficiency to be practical.
It is commonly believed that CNN-based CL excels in resource efficiency, whereas ViT-based CL is superior in model performance.
We introduce REP, which improves resource efficiency specifically targeting prompt-based rehearsal-free methods.
arXiv Detail & Related papers (2024-06-07T09:17:33Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - MAP: A Model-agnostic Pretraining Framework for Click-through Rate
Prediction [39.48740397029264]
We propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data.
We derive two practical algorithms: masked feature prediction (RFD) and replaced feature detection (RFD)
arXiv Detail & Related papers (2023-08-03T12:55:55Z) - Towards Flexible Inductive Bias via Progressive Reparameterization
Scheduling [25.76814731638375]
There are two de facto standard architectures in computer vision: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs)
We show these approaches overlook that the optimal inductive bias also changes according to the target data scale changes.
The more convolution-like inductive bias is included in the model, the smaller the data scale is required where the ViT-like model outperforms the ResNet performance.
arXiv Detail & Related papers (2022-10-04T04:20:20Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [64.71102030006422]
We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE)
It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module.
RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks.
arXiv Detail & Related papers (2020-10-05T06:53:44Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.