DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited
Annotations
- URL: http://arxiv.org/abs/2206.09541v1
- Date: Mon, 20 Jun 2022 02:36:54 GMT
- Title: DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited
Annotations
- Authors: Ximeng Sun, Ping Hu and Kate Saenko
- Abstract summary: We propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR.
Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks.
- Score: 61.41339201200135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Solving multi-label recognition (MLR) for images in the low-label regime is a
challenging task with many real-world applications. Recent work learns an
alignment between textual and visual spaces to compensate for insufficient
image labels, but loses accuracy because of the limited amount of available MLR
annotations. In this work, we utilize the strong alignment of textual and
visual features pretrained with millions of auxiliary image-text pairs and
propose Dual Context Optimization (DualCoOp) as a unified framework for
partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative
contexts with class names as part of the linguistic input (i.e. prompts). Since
DualCoOp only introduces a very light learnable overhead upon the pretrained
vision-language framework, it can quickly adapt to multi-label recognition
tasks that have limited annotations and even unseen classes. Experiments on
standard multi-label recognition benchmarks across two challenging low-label
settings demonstrate the advantages of our approach over state-of-the-art
methods.
Related papers
- Text-Region Matching for Multi-Label Image Recognition with Missing Labels [5.095488730708477]
TRM-ML is a novel method for enhancing meaningful cross-modal matching.
We propose a category prototype that leverages intra- and inter-category semantic relationships to estimate unknown labels.
Our proposed framework outperforms the state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2024-07-26T05:29:24Z) - Multi-Label Self-Supervised Learning with Scene Images [21.549234013998255]
This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem.
The proposed method is named Multi-Label Self-supervised learning (MLS)
arXiv Detail & Related papers (2023-08-07T04:04:22Z) - DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations [79.433122872973]
Multi-label image recognition in the low-label regime is a task of great challenge and practical significance.
We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs.
We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
arXiv Detail & Related papers (2023-08-03T17:33:20Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - Texts as Images in Prompt Tuning for Multi-Label Image Recognition [70.9310322461598]
We advocate that image-text contrastive learning makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting.
Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning.
Our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-11-23T07:00:11Z) - CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
Image-Text Retrieval [108.48540976175457]
We propose Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation.
We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.
Experiments conducted on two popular benchmarks, i.e. MSCOCO and Flicker30K, validate CODER remarkably outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2022-08-21T08:37:50Z) - Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on
Aligned Visual-Textual Features [14.334304670606633]
We propose a novel algorithm, Aligned Dual moDality ClaSsifier (ADDS), which includes a Dual-Modal decoder (DM-decoder) with alignment between visual and textual features.
Extensive experiments conducted on several standard benchmarks, NUS-WIDE, ImageNet-1k, ImageNet-21k, and MS-COCO, demonstrate that our approach significantly outperforms previous methods.
arXiv Detail & Related papers (2022-08-19T22:45:07Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.