Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
- URL: http://arxiv.org/abs/2505.09139v1
- Date: Wed, 14 May 2025 04:43:36 GMT
- Title: Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
- Authors: Lucas Choi, Ross Greer,
- Abstract summary: We introduce a method for automated prompt refinement using a novel metric called the Contrastive Class Alignment Score (CCAS)<n>Our method generates diverse prompt candidates via a large language model and filters them through CCAS, computed using prompt embeddings from a sentence transformer.<n>We evaluate our approach on challenging object categories, demonstrating that our automatic selection of high-precision prompts improves object detection accuracy without the need for model training or labeled data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language models (VLMs) offer flexible object detection through natural language prompts but suffer from performance variability depending on prompt phrasing. In this paper, we introduce a method for automated prompt refinement using a novel metric called the Contrastive Class Alignment Score (CCAS), which ranks prompts based on their semantic alignment with a target object class while penalizing similarity to confounding classes. Our method generates diverse prompt candidates via a large language model and filters them through CCAS, computed using prompt embeddings from a sentence transformer. We evaluate our approach on challenging object categories, demonstrating that our automatic selection of high-precision prompts improves object detection accuracy without the need for additional model training or labeled data. This scalable and model-agnostic pipeline offers a principled alternative to manual prompt engineering for VLM-based detection systems.
Related papers
- LLM-Guided Agentic Object Detection for Open-World Understanding [45.08126325125808]
Object detection traditionally relies on fixed category sets, requiring costly re-training to handle novel objects.<n>We propose an LLM-guided agentic object detection framework that enables fully label-free, zero-shot detection.<n>Our method offers enhanced autonomy and adaptability for open-world understanding.
arXiv Detail & Related papers (2025-07-14T22:30:48Z) - Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems [0.0]
We integrate two types of feedback-driven annotations: those that identify spelling and grammatical errors, and those that highlight argumentative components.<n>To illustrate how this method could be applied in real-world scenarios, we employ two LLMs to generate annotations.
arXiv Detail & Related papers (2025-05-28T18:39:56Z) - An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection [0.08974531206817744]
We present an approach for improving non-technical users' natural language text descriptions of their desired targets of interest.<n>We quantify the improvement that our feedback mechanism provides by demonstrating performance with multiple publicly-available open-vocabulary object detection models.
arXiv Detail & Related papers (2025-03-21T16:34:04Z) - QueryAdapter: Rapid Adaptation of Vision-Language Models in Response to Natural Language Queries [2.306164598536725]
We present a novel framework for rapidly adapting a pre-trained VLM to respond to a natural language query.<n>We use unlabelled data collected during previous deployments to align VLM features with semantic classes related to the query.<n>We also explore how objects unrelated to the query should be dealt with when using real-world data for adaptation.
arXiv Detail & Related papers (2025-02-26T01:07:28Z) - Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning [1.6967824074619953]
Large language models (LLMs) offer substantial promise for text classification in political science.<n>Our framework enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism.<n>An open-source Python package (PoliPrompt) is available on GitHub.
arXiv Detail & Related papers (2024-09-02T21:05:31Z) - OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer [63.141027246418]
We propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency.
We provide an end-to-end training recipe that transferring knowledge from vision-language model (VLM) to object detector with simple alignment.
Experimental results demonstrate that the proposed approach is superior over existing real-time open-vocabulary detectors on standard Zero-Shot LVIS benchmark.
arXiv Detail & Related papers (2024-07-15T12:15:27Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting [68.19544657508509]
Large language models (LLMs) are adopted as a fundamental component of language technologies.
We find that several widely used open-source LLMs are extremely sensitive to subtle changes in prompt format in few-shot settings.
We propose an algorithm that rapidly evaluates a sampled set of plausible prompt formats for a given task, and reports the interval of expected performance without accessing model weights.
arXiv Detail & Related papers (2023-10-17T15:03:30Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Automated Few-shot Classification with Instruction-Finetuned Language
Models [76.69064714392165]
We show that AuT-Few outperforms state-of-the-art few-shot learning methods.
We also show that AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark.
arXiv Detail & Related papers (2023-05-21T21:50:27Z) - Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary
Object Detection [87.39089806069707]
We propose a fine-grained Visual-Text Prompt-driven self-training paradigm for Open-Vocabulary Detection (VTP-OVD)
During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task.
Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO.
arXiv Detail & Related papers (2022-11-02T03:38:02Z) - Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot
Classification [15.575483080819563]
We propose Automatic Multi-Label Prompting (AMuLaP) to automatically select label mappings for few-shot text classification with prompting.
Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template.
Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.
arXiv Detail & Related papers (2022-04-13T11:15:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.