DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations
- URL: http://arxiv.org/abs/2308.01890v2
- Date: Thu, 14 Dec 2023 02:19:42 GMT
- Title: DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations
- Authors: Ping Hu, Ximeng Sun, Stan Sclaroff, and Kate Saenko
- Abstract summary: Multi-label image recognition in the low-label regime is a task of great challenge and practical significance.
We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs.
We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
- Score: 79.433122872973
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-label image recognition in the low-label regime is a task of great
challenge and practical significance. Previous works have focused on learning
the alignment between textual and visual spaces to compensate for limited image
labels, yet may suffer from reduced accuracy due to the scarcity of
high-quality multi-label annotations. In this research, we leverage the
powerful alignment between textual and visual features pretrained with millions
of auxiliary image-text pairs. We introduce an efficient and effective
framework called Evidence-guided Dual Context Optimization (DualCoOp++), which
serves as a unified approach for addressing partial-label and zero-shot
multi-label recognition. In DualCoOp++ we separately encode evidential,
positive, and negative contexts for target classes as parametric components of
the linguistic input (i.e., prompts). The evidential context aims to discover
all the related visual content for the target class, and serves as guidance to
aggregate positive and negative contexts from the spatial domain of the image,
enabling better distinguishment between similar categories. Additionally, we
introduce a Winner-Take-All module that promotes inter-class interaction during
training, while avoiding the need for extra parameters and costs. As DualCoOp++
imposes minimal additional learnable overhead on the pretrained vision-language
framework, it enables rapid adaptation to multi-label recognition tasks with
limited annotations and even unseen classes. Experiments on standard
multi-label recognition benchmarks across two challenging low-label settings
demonstrate the superior performance of our approach compared to
state-of-the-art methods.
Related papers
- Text-Region Matching for Multi-Label Image Recognition with Missing Labels [5.095488730708477]
TRM-ML is a novel method for enhancing meaningful cross-modal matching.
We propose a category prototype that leverages intra- and inter-category semantic relationships to estimate unknown labels.
Our proposed framework outperforms the state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2024-07-26T05:29:24Z) - Text as Image: Learning Transferable Adapter for Multi-Label
Classification [13.11583340598517]
We introduce an effective approach to employ large language models for multi-label instruction-following text generation.
In this way, a fully automated pipeline for visual label recognition is developed without relying on any manual data.
arXiv Detail & Related papers (2023-12-07T09:22:20Z) - Semantic Contrastive Bootstrapping for Single-positive Multi-label
Recognition [36.3636416735057]
We present a semantic contrastive bootstrapping (Scob) approach to gradually recover the cross-object relationships.
We then propose a recurrent semantic masked transformer to extract iconic object-level representations.
Extensive experimental results demonstrate that the proposed joint learning framework surpasses the state-of-the-art models.
arXiv Detail & Related papers (2023-07-15T01:59:53Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
Image-Text Retrieval [108.48540976175457]
We propose Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation.
We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.
Experiments conducted on two popular benchmarks, i.e. MSCOCO and Flicker30K, validate CODER remarkably outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2022-08-21T08:37:50Z) - DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited
Annotations [61.41339201200135]
We propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR.
Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks.
arXiv Detail & Related papers (2022-06-20T02:36:54Z) - FILIP: Fine-grained Interactive Language-Image Pre-Training [106.19474076935363]
Fine-grained Interactive Language-Image Pre-training achieves finer-level alignment through a cross-modal late interaction mechanism.
We construct a new large-scale image-text pair dataset called FILIP300M for pre-training.
Experiments show that FILIP achieves state-of-the-art performance on multiple downstream vision-language tasks.
arXiv Detail & Related papers (2021-11-09T17:15:38Z) - Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases.
We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.