Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models
- URL: http://arxiv.org/abs/2403.12964v1
- Date: Tue, 19 Mar 2024 17:59:39 GMT
- Title: Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models
- Authors: Ce Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie,
- Abstract summary: We introduce the concept of dual learning into fine-tuning Vision-Language Models (VLMs)
We introduce a novel DualAdapter approach to enable dual-path adaptation of VLMs from both positive and negative perspectives.
Our experimental results validate that the proposed DualAdapter outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks.
- Score: 11.545127156146368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, large-scale pre-trained Vision-Language Models (VLMs) have demonstrated great potential in learning open-world visual representations, and exhibit remarkable performance across a wide range of downstream tasks through efficient fine-tuning. In this work, we innovatively introduce the concept of dual learning into fine-tuning VLMs, i.e., we not only learn what an image is, but also what an image isn't. Building on this concept, we introduce a novel DualAdapter approach to enable dual-path adaptation of VLMs from both positive and negative perspectives with only limited annotated samples. In the inference stage, our DualAdapter performs unified predictions by simultaneously conducting complementary positive selection and negative exclusion across target classes, thereby enhancing the overall recognition accuracy of VLMs in downstream tasks. Our extensive experimental results across 15 datasets validate that the proposed DualAdapter outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks while achieving competitive computational efficiency. Code is available at https://github.com/zhangce01/DualAdapter.
Related papers
- Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning [67.06363342414397]
Task-oriented semantic communication enhances transmission efficiency by conveying semantic information rather than exact messages.<n>Deep learning (DL)-based semantic communication can effectively cultivate the essential semantic knowledge for semantic extraction, transmission, and interpretation.<n>We propose a self-supervised learning-based semantic communication framework (SLSCom) to enhance task inference performance.
arXiv Detail & Related papers (2025-05-26T13:06:18Z) - A Contrastive Learning Foundation Model Based on Perfectly Aligned Sample Pairs for Remote Sensing Images [18.191222010916405]
We present a novel self-supervised method called PerA, which produces all-purpose Remote Sensing features through semantically Perfectly Aligned sample pairs.<n>Our framework provides high-quality features by ensuring consistency between teacher and student.<n>We collect an unlabeled pre-training dataset, which contains about 5 million RS images.
arXiv Detail & Related papers (2025-05-26T03:12:49Z) - Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.
We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.
We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model [41.55165760439727]
Vision-language models (VLMs) have revolutionized machine learning by leveraging large pre-trained models to tackle various downstream tasks.
We propose a graph-based approach for label-efficient adaptation and inference.
Our method dynamically constructs a graph over text prompts, few-shot examples, and test samples, using label propagation for inference without task-specific tuning.
arXiv Detail & Related papers (2024-12-24T09:15:00Z) - Active Prompt Learning with Vision-Language Model Priors [9.173468790066956]
We introduce a class-guided clustering that leverages the pre-trained image and text encoders of vision-language models.
We propose a budget-saving selective querying based on adaptive class-wise thresholds.
arXiv Detail & Related papers (2024-11-23T02:34:33Z) - Active Learning for Vision-Language Models [29.309503214127016]
We propose a novel active learning (AL) framework that enhances the zero-shot classification performance of vision-language models (VLMs)
Our approach first calibrates the predicted entropy of VLMs and then utilizes a combination of self-uncertainty and neighbor-aware uncertainty to calculate a reliable uncertainty measure for active sample selection.
Our experiments show that the proposed approach outperforms existing AL approaches on several image classification datasets.
arXiv Detail & Related papers (2024-10-29T16:25:50Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning [45.25602203155762]
Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data.
A major challenge that hinders both unimodal and multimodal contrastive learning is feature suppression.
We propose a novel model-agnostic Multistage Contrastive Learning framework.
arXiv Detail & Related papers (2024-02-19T04:13:33Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - A Practical Contrastive Learning Framework for Single-Image
Super-Resolution [51.422185656787285]
We investigate contrastive learning-based single image super-resolution from two perspectives.
We propose a practical contrastive learning framework for SISR, named PCL-SR.
Compared with existing benchmark methods, we re-train them by our proposed PCL-SR framework and achieve superior performance.
arXiv Detail & Related papers (2021-11-27T15:42:12Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Revisiting Unsupervised Meta-Learning: Amplifying or Compensating for
the Characteristics of Few-Shot Tasks [30.893785366366078]
We develop a practical approach towards few-shot image classification, where a visual recognition system is constructed with limited data.
We find that the base class set labels are not necessary, and discriminative embeddings could be meta-learned in an unsupervised manner.
Experiments on few-shot learning benchmarks verify our approaches outperform previous methods by a 4-10% performance gap.
arXiv Detail & Related papers (2020-11-30T10:08:35Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.