Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition
- URL: http://arxiv.org/abs/2305.07180v3
- Date: Tue, 27 Feb 2024 05:44:55 GMT
- Title: Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition
- Authors: Haiqi Liu, C. L. Philip Chen, Xinrong Gong and Tong Zhang
- Abstract summary: Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
- Score: 57.08108545219043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing novel sub-categories with scarce samples is an essential and
challenging research topic in computer vision. Existing literature addresses
this challenge by employing local-based representation approaches, which may
not sufficiently facilitate meaningful object-specific semantic understanding,
leading to a reliance on apparent background correlations. Moreover, they
primarily rely on high-dimensional local descriptors to construct complex
embedding space, potentially limiting the generalization. To address the above
challenges, this article proposes a novel model, Robust Saliency-aware
Distillation (RSaD), for few-shot fine-grained visual recognition. RSaD
introduces additional saliency-aware supervision via saliency detection to
guide the model toward focusing on the intrinsic discriminative regions.
Specifically, RSaD utilizes the saliency detection model to emphasize the
critical regions of each sub-category, providing additional object-specific
information for fine-grained prediction. RSaD transfers such information with
two symmetric branches in a mutual learning paradigm. Furthermore, RSaD
exploits inter-regional relationships to enhance the informativeness of the
representation and subsequently summarize the highlighted details into
contextual embeddings to facilitate the effective transfer, enabling quick
generalization to novel sub-categories. The proposed approach is empirically
evaluated on three widely used benchmarks, demonstrating its superior
performance.
Related papers
- Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents [16.78371134590167]
Key-value relations are prevalent in Visually-Rich Documents (VRDs)
These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets.
Our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs.
arXiv Detail & Related papers (2024-03-23T08:40:35Z) - Improving Vision-and-Language Reasoning via Spatial Relations Modeling [30.477235227733928]
Visual commonsense reasoning (VCR) is a challenging multi-modal task.
The proposed method can guide the representations to maintain more spatial context.
We achieve the state-of-the-art results on VCR and two other vision-and-language reasoning tasks VQA, and NLVR.
arXiv Detail & Related papers (2023-11-09T11:54:55Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Entity-Conditioned Question Generation for Robust Attention Distribution
in Neural Information Retrieval [51.53892300802014]
We show that supervised neural information retrieval models are prone to learning sparse attention patterns over passage tokens.
Using a novel targeted synthetic data generation method, we teach neural IR to attend more uniformly and robustly to all entities in a given passage.
arXiv Detail & Related papers (2022-04-24T22:36:48Z) - Semantic Reinforced Attention Learning for Visual Place Recognition [15.84086970453363]
Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task.
We propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning.
Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.
arXiv Detail & Related papers (2021-08-19T02:14:36Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - Improve the Interpretability of Attention: A Fast, Accurate, and
Interpretable High-Resolution Attention Model [6.906621279967867]
We propose a novel Bilinear Representative Non-Parametric Attention (BR-NPA) strategy that captures the task-relevant human-interpretable information.
The proposed model can be easily adapted in a wide variety of modern deep models, where classification is involved.
It is also more accurate, faster, and with a smaller memory footprint than usual neural attention modules.
arXiv Detail & Related papers (2021-06-04T15:57:37Z) - Towards Novel Target Discovery Through Open-Set Domain Adaptation [73.81537683043206]
Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain.
We propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories.
arXiv Detail & Related papers (2021-05-06T04:22:29Z) - Learning Domain Invariant Representations for Generalizable Person
Re-Identification [71.35292121563491]
Generalizable person Re-Identification (ReID) has attracted growing attention in recent computer vision community.
We introduce causality into person ReID and propose a novel generalizable framework, named Domain Invariant Representations for generalizable person Re-Identification (DIR-ReID)
arXiv Detail & Related papers (2021-03-29T18:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.