LDCA: Local Descriptors with Contextual Augmentation for Few-Shot
Learning
- URL: http://arxiv.org/abs/2401.13499v1
- Date: Wed, 24 Jan 2024 14:44:48 GMT
- Title: LDCA: Local Descriptors with Contextual Augmentation for Few-Shot
Learning
- Authors: Maofa Wang and Bingchen Yan
- Abstract summary: We introduce a novel approach termed "Local Descriptor with Contextual Augmentation (LDCA)"
LDCA bridges the gap between local and global understanding by leveraging an adaptive global contextual enhancement module.
Experiments underscore the efficacy of our method, showing a maximal absolute improvement of 20% over the next-best on fine-grained classification datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot image classification has emerged as a key challenge in the field of
computer vision, highlighting the capability to rapidly adapt to new tasks with
minimal labeled data. Existing methods predominantly rely on image-level
features or local descriptors, often overlooking the holistic context
surrounding these descriptors. In this work, we introduce a novel approach
termed "Local Descriptor with Contextual Augmentation (LDCA)". Specifically,
this method bridges the gap between local and global understanding uniquely by
leveraging an adaptive global contextual enhancement module. This module
incorporates a visual transformer, endowing local descriptors with contextual
awareness capabilities, ranging from broad global perspectives to intricate
surrounding nuances. By doing so, LDCA transcends traditional descriptor-based
approaches, ensuring each local feature is interpreted within its larger visual
narrative. Extensive experiments underscore the efficacy of our method, showing
a maximal absolute improvement of 20\% over the next-best on fine-grained
classification datasets, thus demonstrating significant advancements in
few-shot classification tasks.
Related papers
- Globality Strikes Back: Rethinking the Global Knowledge of CLIP in Training-Free Open-Vocabulary Semantic Segmentation [5.3499687969383345]
Recent works modify CLIP to perform semantic segmentation in a training-free manner (TF-OVSS)
With their modifications, the ability of CLIP to aggregate global context information is largely weakened.
We propose a new method named GCLIP, which mines the beneficial global knowledge of CLIP to facilitate the TF-OVSS task.
arXiv Detail & Related papers (2025-02-05T03:37:50Z) - Grounding Descriptions in Images informs Zero-Shot Visual Recognition [47.66166611138081]
We propose GRAIN, a new pretraining strategy aimed at aligning representations at both fine and coarse levels simultaneously.
We demonstrate the enhanced zero-shot performance of our model compared to current state-of-the art methods.
arXiv Detail & Related papers (2024-12-05T18:52:00Z) - GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection [5.530212768657544]
We introduce glocal contrastive learning to improve the complementary learning of global and local prompts.
The generalization performance of GlocalCLIP in ZSAD was demonstrated on 15 real-world datasets.
arXiv Detail & Related papers (2024-11-09T05:22:13Z) - DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation [8.422110274212503]
Weakly supervised semantic segmentation approaches typically rely on class activation maps (CAMs) for initial seed generation.
We introduce DALNet, which leverages text embeddings to enhance the comprehensive understanding and precise localization of objects across different levels of granularity.
Our approach, in particular, allows for more efficient end-to-end process as a single-stage method.
arXiv Detail & Related papers (2024-09-24T06:51:49Z) - Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model [61.389233691596004]
We introduce the DiffPNG framework, which capitalizes on the diffusion's architecture for segmentation by decomposing the process into a sequence of localization, segmentation, and refinement steps.
Our experiments on the PNG dataset demonstrate that DiffPNG achieves strong performance in the zero-shot PNG task setting.
arXiv Detail & Related papers (2024-07-07T13:06:34Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Fine-Grained Image Captioning with Global-Local Discriminative Objective [80.73827423555655]
We propose a novel global-local discriminative objective to facilitate generating fine-grained descriptive captions.
We evaluate the proposed method on the widely used MS-COCO dataset.
arXiv Detail & Related papers (2020-07-21T08:46:02Z) - Weakly-supervised Object Localization for Few-shot Learning and
Fine-grained Few-shot Learning [0.5156484100374058]
Few-shot learning aims to learn novel visual categories from very few samples.
We propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization.
We also produce the activated masks for selecting discriminative deep descriptors for few-shot classification.
arXiv Detail & Related papers (2020-03-02T14:07:05Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.