Related papers: Active Prompt Learning in Vision Language Models

Active Prompt Learning in Vision Language Models

URL: http://arxiv.org/abs/2311.11178v3
Date: Thu, 21 Mar 2024 07:52:03 GMT
Title: Active Prompt Learning in Vision Language Models
Authors: Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee,
Abstract summary: We devise a novel active learning framework for pre-trained Vision Language Models, denoted as PCB. We conduct experiments on seven different real-world datasets, and the results demonstrate that PCB surpasses conventional active learning and random sampling methods.
Score: 21.276006224504748
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may degrade performance compared to random selection because of the class imbalance in labeling candidates, and (2) the knowledge of VLMs can provide hints for achieving the balance before labeling. Based on these observations, we devise a novel active learning framework for VLMs, denoted as PCB. To assess the effectiveness of our approach, we conduct experiments on seven different real-world datasets, and the results demonstrate that PCB surpasses conventional active learning and random sampling methods. Code will be available in https://github.com/kaist-dmlab/pcb .

Related papers

From Selection to Generation: A Survey of LLM-based Active Learning [153.8110509961261]
Large Language Models (LLMs) have been employed for generating entirely new data instances and providing more cost-effective annotations. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques.
arXiv Detail & Related papers (2025-02-17T12:58:17Z)
Active Prompt Learning with Vision-Language Model Priors [9.173468790066956]
We introduce a class-guided clustering that leverages the pre-trained image and text encoders of vision-language models. We propose a budget-saving selective querying based on adaptive class-wise thresholds.
arXiv Detail & Related papers (2024-11-23T02:34:33Z)
Making Large Vision Language Models to be Good Few-shot Learners [11.204701216476815]
Few-shot classification (FSC) is a fundamental yet challenging task in computer vision. LVLMs risk learning specific response formats rather than effectively extracting useful information from support data. In this paper, we investigate LVLMs' performance in FSC and identify key issues such as insufficient learning and the presence of severe positional biases.
arXiv Detail & Related papers (2024-08-21T03:01:11Z)
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context. Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets. Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z)
Tuning Vision-Language Models with Candidate Labels by Prompt Alignment [8.013652039026264]
Vision-language models (VLMs) can learn high-quality representations from a large-scale training dataset of image-text pairs. Prompt learning is a popular approach to fine-tuning VLM to adapt them to downstream tasks. We propose a framework to guide the learning process with candidate labels.
arXiv Detail & Related papers (2024-07-10T13:19:31Z)
Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance. In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem. We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data. We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning. Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z)
Just Label What You Need: Fine-Grained Active Selection for Perception and Prediction through Partially Labeled Scenes [78.23907801786827]
We introduce generalizations that ensure that our approach is both cost-aware and allows for fine-grained selection of examples through partially labeled scenes. Our experiments on a real-world, large-scale self-driving dataset suggest that fine-grained selection can improve the performance across perception, prediction, and downstream planning tasks.
arXiv Detail & Related papers (2021-04-08T17:57:41Z)
Rebuilding Trust in Active Learning with Actionable Metrics [77.99796068970569]
Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs. This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets. We present various actionable metrics to help rebuild trust of industrial practitioners in Active Learning.
arXiv Detail & Related papers (2020-12-18T09:34:59Z)
Active and Incremental Learning with Weak Supervision [7.2288756536476635]
In this work, we describe combinations of an incremental learning scheme and methods of active learning. An object detection task is evaluated in a continuous exploration context on the PASCAL VOC dataset. We also validate a weakly supervised system based on active and incremental learning in a real-world biodiversity application.
arXiv Detail & Related papers (2020-01-20T13:21:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.