Related papers: Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models

Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models

URL: http://arxiv.org/abs/2505.09139v1
Date: Wed, 14 May 2025 04:43:36 GMT
Title: Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Authors: Lucas Choi, Ross Greer,
Abstract summary: We introduce a method for automated prompt refinement using a novel metric called the Contrastive Class Alignment Score (CCAS)<n>Our method generates diverse prompt candidates via a large language model and filters them through CCAS, computed using prompt embeddings from a sentence transformer.<n>We evaluate our approach on challenging object categories, demonstrating that our automatic selection of high-precision prompts improves object detection accuracy without the need for model training or labeled data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language models (VLMs) offer flexible object detection through natural language prompts but suffer from performance variability depending on prompt phrasing. In this paper, we introduce a method for automated prompt refinement using a novel metric called the Contrastive Class Alignment Score (CCAS), which ranks prompts based on their semantic alignment with a target object class while penalizing similarity to confounding classes. Our method generates diverse prompt candidates via a large language model and filters them through CCAS, computed using prompt embeddings from a sentence transformer. We evaluate our approach on challenging object categories, demonstrating that our automatic selection of high-precision prompts improves object detection accuracy without the need for additional model training or labeled data. This scalable and model-agnostic pipeline offers a principled alternative to manual prompt engineering for VLM-based detection systems.

Related papers

LLM-Guided Agentic Object Detection for Open-World Understanding [45.08126325125808]
Object detection traditionally relies on fixed category sets, requiring costly re-training to handle novel objects.<n>We propose an LLM-guided agentic object detection framework that enables fully label-free, zero-shot detection.<n>Our method offers enhanced autonomy and adaptability for open-world understanding.
arXiv Detail & Related papers (2025-07-14T22:30:48Z)
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems [0.0]
We integrate two types of feedback-driven annotations: those that identify spelling and grammatical errors, and those that highlight argumentative components.<n>To illustrate how this method could be applied in real-world scenarios, we employ two LLMs to generate annotations.
arXiv Detail & Related papers (2025-05-28T18:39:56Z)
An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection [0.08974531206817744]
We present an approach for improving non-technical users' natural language text descriptions of their desired targets of interest.<n>We quantify the improvement that our feedback mechanism provides by demonstrating performance with multiple publicly-available open-vocabulary object detection models.
arXiv Detail & Related papers (2025-03-21T16:34:04Z)
QueryAdapter: Rapid Adaptation of Vision-Language Models in Response to Natural Language Queries [2.306164598536725]
We present a novel framework for rapidly adapting a pre-trained VLM to respond to a natural language query.<n>We use unlabelled data collected during previous deployments to align VLM features with semantic classes related to the query.<n>We also explore how objects unrelated to the query should be dealt with when using real-world data for adaptation.
arXiv Detail & Related papers (2025-02-26T01:07:28Z)
Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning [1.6967824074619953]
Large language models (LLMs) offer substantial promise for text classification in political science.<n>Our framework enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism.<n>An open-source Python package (PoliPrompt) is available on GitHub.
arXiv Detail & Related papers (2024-09-02T21:05:31Z)
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer [63.141027246418]
We propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency. We provide an end-to-end training recipe that transferring knowledge from vision-language model (VLM) to object detector with simple alignment. Experimental results demonstrate that the proposed approach is superior over existing real-time open-vocabulary detectors on standard Zero-Shot LVIS benchmark.
arXiv Detail & Related papers (2024-07-15T12:15:27Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting [68.19544657508509]
Large language models (LLMs) are adopted as a fundamental component of language technologies. We find that several widely used open-source LLMs are extremely sensitive to subtle changes in prompt format in few-shot settings. We propose an algorithm that rapidly evaluates a sampled set of plausible prompt formats for a given task, and reports the interval of expected performance without accessing model weights.
arXiv Detail & Related papers (2023-10-17T15:03:30Z)
MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task. We conduct experiments on three widely used text classification datasets across four few-shot settings. Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z)
Automated Few-shot Classification with Instruction-Finetuned Language Models [76.69064714392165]
We show that AuT-Few outperforms state-of-the-art few-shot learning methods. We also show that AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark.
arXiv Detail & Related papers (2023-05-21T21:50:27Z)
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection [87.39089806069707]
We propose a fine-grained Visual-Text Prompt-driven self-training paradigm for Open-Vocabulary Detection (VTP-OVD) During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task. Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO.
arXiv Detail & Related papers (2022-11-02T03:38:02Z)
Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification [15.575483080819563]
We propose Automatic Multi-Label Prompting (AMuLaP) to automatically select label mappings for few-shot text classification with prompting. Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template. Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.
arXiv Detail & Related papers (2022-04-13T11:15:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.