Concept Drift and Long-Tailed Distribution in Fine-Grained Visual Categorization: Benchmark and Method
- URL: http://arxiv.org/abs/2306.02346v2
- Date: Mon, 11 Nov 2024 12:54:35 GMT
- Title: Concept Drift and Long-Tailed Distribution in Fine-Grained Visual Categorization: Benchmark and Method
- Authors: Shuo Ye, Shiming Chen, Ruxin Wang, Tianxu Wu, Jiamiao Xu, Salman Khan, Fahad Shahbaz Khan, Ling Shao,
- Abstract summary: We present a Concept Drift and Long-Tailed Distribution dataset.
The characteristics of instances tend to vary with time and exhibit a long-tailed distribution.
We propose a feature recombination framework to address the learning challenges associated with CDLT.
- Score: 84.68818879525568
- License:
- Abstract: Data is the foundation for the development of computer vision, and the establishment of datasets plays an important role in advancing the techniques of fine-grained visual categorization~(FGVC). In the existing FGVC datasets used in computer vision, it is generally assumed that each collected instance has fixed characteristics and the distribution of different categories is relatively balanced. In contrast, the real world scenario reveals the fact that the characteristics of instances tend to vary with time and exhibit a long-tailed distribution. Hence, the collected datasets may mislead the optimization of the fine-grained classifiers, resulting in unpleasant performance in real applications. Starting from the real-world conditions and to promote the practical progress of fine-grained visual categorization, we present a Concept Drift and Long-Tailed Distribution dataset. Specifically, the dataset is collected by gathering 11195 images of 250 instances in different species for 47 consecutive months in their natural contexts. The collection process involves dozens of crowd workers for photographing and domain experts for labeling. Meanwhile, we propose a feature recombination framework to address the learning challenges associated with CDLT. Experimental results validate the efficacy of our method while also highlighting the limitations of popular large vision-language models (e.g., CLIP) in the context of long-tailed distributions. This emphasizes the significance of CDLT as a benchmark for investigating these challenges.
Related papers
- Explaining Categorical Feature Interactions Using Graph Covariance and LLMs [18.44675735926458]
This paper focuses on the global synthetic dataset from the Counter Trafficking Data Collaborative.
It contains over 200,000 anonymized records spanning from 2002 to 2022 with numerous categorical features for each record.
We propose a fast and scalable method for analyzing and extracting significant categorical feature interactions.
arXiv Detail & Related papers (2025-01-24T21:41:26Z) - HiGDA: Hierarchical Graph of Nodes to Learn Local-to-Global Topology for Semi-Supervised Domain Adaptation [0.18749305679160366]
We introduce a Hierarchical Graph of Nodes designed to simultaneously present representations at both feature and category levels.
In this study, we introduce a local graph to identify the most relevant patches within an image, facilitating adaptability to defined main object representations.
At the category level, we employ a global graph to aggregate the features from samples within the same category, thereby enriching overall representations.
arXiv Detail & Related papers (2024-12-16T14:35:52Z) - Visual Data Diagnosis and Debiasing with Concept Graphs [50.84781894621378]
We present ConBias, a framework for diagnosing and mitigating Concept co-occurrence Biases in visual datasets.
We show that by employing a novel clique-based concept balancing strategy, we can mitigate these imbalances, leading to enhanced performance on downstream tasks.
arXiv Detail & Related papers (2024-09-26T16:59:01Z) - Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language [41.40908753726324]
Diffusion models can generate realistic and diverse images, potentially facilitating data availability for data-intensive perception tasks.
We present textbfAuto textbfCherry-textbfPicker (ACP), a novel framework that generates high-quality cross-modality training samples.
arXiv Detail & Related papers (2024-06-28T17:53:18Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Generalized Representations Learning for Time Series Classification [28.230863650758447]
We argue that the temporal complexity attributes to the unknown latent distributions within time series classification.
We present experiments on gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition.
arXiv Detail & Related papers (2022-09-15T03:36:31Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Input-Output Balanced Framework for Long-tailed LiDAR Semantic
Segmentation [12.639524717464509]
We propose an input-output balanced framework to handle the issue of long-tailed distribution.
For the input space, we synthesize these tailed instances from mesh models and well simulate the position and density distribution of LiDAR scan.
For the output space, a multi-head block is proposed to group different categories based on their shapes and instance amounts.
arXiv Detail & Related papers (2021-03-26T05:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.