Fine Grain Classification: Connecting Meta using Cross-Contrastive pre-training
- URL: http://arxiv.org/abs/2504.20322v1
- Date: Tue, 29 Apr 2025 00:23:41 GMT
- Title: Fine Grain Classification: Connecting Meta using Cross-Contrastive pre-training
- Authors: Sumit Mamtani, Yash Thesia,
- Abstract summary: We propose a novel framework that leverages meta-information to assist fine-grained identification.<n>We tackle the joint learning of visual and meta-information through cross-contrastive pre-training.<n> Experiments on the NABirds dataset demonstrate that our framework effectively utilizes meta-information to enhance fine-grained recognition performance.
- Score: 0.06906005491572399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained visual classification aims to recognize objects belonging to multiple subordinate categories within a super-category. However, this remains a challenging problem, as appearance information alone is often insufficient to accurately differentiate between fine-grained visual categories. To address this, we propose a novel and unified framework that leverages meta-information to assist fine-grained identification. We tackle the joint learning of visual and meta-information through cross-contrastive pre-training. In the first stage, we employ three encoders for images, text, and meta-information, aligning their projected embeddings to achieve better representations. We then fine-tune the image and meta-information encoders for the classification task. Experiments on the NABirds dataset demonstrate that our framework effectively utilizes meta-information to enhance fine-grained recognition performance. With the addition of meta-information, our framework surpasses the current baseline on NABirds by 7.83%. Furthermore, it achieves an accuracy of 84.44% on the NABirds dataset, outperforming many existing state-of-the-art approaches that utilize meta-information.
Related papers
- Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery [65.16724941038052]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories.<n>Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories.<n>We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition [27.199124692225777]
Scene recognition based on deep-learning has made significant progress, but there are still limitations in its performance.
We propose EnTri, a framework that employs ensemble learning using a hierarchy of visual features.
EnTri has demonstrated superiority in terms of recognition accuracy, achieving competitive performance compared to state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-23T22:11:23Z) - Exploring Weakly Supervised Semantic Segmentation Ensembles for Medical
Imaging Systems [11.693197342734152]
We propose a framework for reliable classification and detection of medical conditions in images.
Our framework achieves that by first utilizing lower threshold CAMs to cover the target object with high certainty.
We have demonstrated an improved dice score of up to 8% on BRATS and 6% on DECATHLON datasets.
arXiv Detail & Related papers (2023-03-14T13:31:05Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - MetaFormer: A Unified Meta Framework for Fine-Grained Recognition [16.058297377539418]
We propose a unified and strong meta-framework for fine-grained visual classification.
In practice, MetaFormer provides a simple yet effective approach to address the joint learning of vision and various meta-information.
In experiments, MetaFormer can effectively use various meta-information to improve the performance of fine-grained recognition.
arXiv Detail & Related papers (2022-03-05T14:12:25Z) - Cross-Domain Few-Shot Graph Classification [7.23389716633927]
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces.
We propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views.
We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy.
arXiv Detail & Related papers (2022-01-20T16:16:30Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z) - Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine
Pseudo-Labeling with Visual-Semantic Meta-Embedding [13.063136901934865]
Few-shot learning aims at rapidly adapting to novel categories with only a handful of samples at test time.
In this paper, we advance the few-shot classification paradigm towards a more challenging scenario, i.e., cross-granularity few-shot classification.
We approximate the fine-grained data distribution by greedy clustering of each coarse-class into pseudo-fine-classes according to the similarity of image embeddings.
arXiv Detail & Related papers (2020-07-11T03:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.