DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
- URL: http://arxiv.org/abs/2404.15771v1
- Date: Wed, 24 Apr 2024 09:45:12 GMT
- Title: DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
- Authors: Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li,
- Abstract summary: Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization.
Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself.
This paper proposes practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models.
- Score: 67.44394651662738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a semantic-oriented module. These components serve to magnify objects and identify discriminative regions, respectively. Following G3, we implement a discriminative model training strategy to improve the discriminability and generalization ability of DVF. Extensive analysis and ablation studies confirm the efficacy of our proposed guidelines. Without bells and whistles, the proposed DVF achieves state-of-the-art performance on three widely-used fine-grained datasets in closed-set and open-set settings.
Related papers
- Feature Aligning Few shot Learning Method Using Local Descriptors Weighted Rules [0.0]
Few-shot classification involves identifying new categories using a limited number of labeled samples.
This paper proposes a Feature Aligning Few-shot Learning Method Using Local Descriptors Weighted Rules (FAFD-LDWR)
It innovatively introduces a cross-normalization method into few-shot image classification to preserve the discriminative information of local descriptors as much as possible; and enhances classification performance by aligning key local descriptors of support and query sets to remove background noise.
arXiv Detail & Related papers (2024-08-26T11:36:38Z) - Multi-Granularity Language-Guided Multi-Object Tracking [95.91263758294154]
We propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity.
At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions.
Our LG-MOT achieves an absolute gain of 2.2% in terms of target object association (IDF1 score) compared to the baseline using only visual features.
arXiv Detail & Related papers (2024-06-07T11:18:40Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain
Generalization [69.33162366130887]
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features.
We introduce a novel method designed to supplement the model with domain-level and task-specific characteristics.
This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization.
arXiv Detail & Related papers (2024-01-18T04:23:21Z) - Zero-shot Visual Relation Detection via Composite Visual Cues from Large
Language Models [44.60439935450292]
We propose a novel method for zero-shot visual recognition: RECODE.
It decomposes each predicate category into subject, object, and spatial components.
Different visual cues enhance the discriminability of similar relation categories from different perspectives.
arXiv Detail & Related papers (2023-05-21T14:40:48Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Multi-View Active Fine-Grained Recognition [29.980409725777292]
Fine-grained visual classification (FGVC) is being developed for decades.
Discriminative information is not only present within seen local regions but also hides in other unseen perspectives.
We propose a policy-gradient-based framework to achieve efficient recognition with active view selection.
arXiv Detail & Related papers (2022-06-02T17:12:14Z) - R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction [21.11038841356125]
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences.
We present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target.
arXiv Detail & Related papers (2022-04-21T13:35:38Z) - Explicitly Modeling the Discriminability for Instance-Aware Visual
Object Tracking [13.311777431243296]
We propose a novel Instance-Aware Tracker (IAT) to excavate the discriminability of feature representations.
We implement two variants of the proposed IAT, including a video-level one and an object-level one.
Both versions achieve leading results against state-of-the-art methods while running at 30FPS.
arXiv Detail & Related papers (2021-10-28T11:24:01Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.