Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting
- URL: http://arxiv.org/abs/2409.15953v1
- Date: Tue, 24 Sep 2024 10:35:42 GMT
- Title: Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting
- Authors: Luca Ciampi, Nicola Messina, Matteo Pierucci, Giuseppe Amato, Marco Avvenuti, Fabrizio Falchi,
- Abstract summary: Class-agnostic counting (CAC) is a recent task in computer vision that aims to estimate the number of instances of arbitrary object classes never seen during model training.
We introduce the Prompt-Aware Counting benchmark, which comprises two targeted tests, each accompanied by appropriate evaluation metrics.
- Score: 8.000723123087473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class-agnostic counting (CAC) is a recent task in computer vision that aims to estimate the number of instances of arbitrary object classes never seen during model training. With the recent advancement of robust vision-and-language foundation models, there is a growing interest in prompt-based CAC, where object categories to be counted can be specified using natural language. However, we identify significant limitations in current benchmarks for evaluating this task, which hinder both accurate assessment and the development of more effective solutions. Specifically, we argue that the current evaluation protocols do not measure the ability of the model to understand which object has to be counted. This is due to two main factors: (i) the shortcomings of CAC datasets, which primarily consist of images containing objects from a single class, and (ii) the limitations of current counting performance evaluators, which are based on traditional class-specific counting and focus solely on counting errors. To fill this gap, we introduce the Prompt-Aware Counting (PrACo) benchmark, which comprises two targeted tests, each accompanied by appropriate evaluation metrics. We evaluate state-of-the-art methods and demonstrate that, although some achieve impressive results on standard class-specific counting metrics, they exhibit a significant deficiency in understanding the input prompt, indicating the need for more careful training procedures or revised designs. The code for reproducing our results is available at https://github.com/ciampluca/PrACo.
Related papers
- Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Enhancing Zero-shot Counting via Language-guided Exemplar Learning [17.479926342093677]
Class-Agnostic Counting (CAC) problem has garnered increasing attention owing to its intriguing generality and superior efficiency.
This paper proposes a novel ExpressCount to enhance zero-shot object counting by delving deeply into language-guided exemplar learning.
The ExpressCount is comprised of an innovative Language-oriented Exemplar Perceptron and a downstream visual Zero-shot Counting pipeline.
arXiv Detail & Related papers (2024-02-08T04:07:38Z) - Zero-Shot Object Counting with Language-Vision Models [50.1159882903028]
Class-agnostic object counting aims to count object instances of an arbitrary class at test time.
Current methods require human-annotated exemplars as inputs which are often unavailable for novel categories.
We propose zero-shot object counting (ZSC), a new setting where only the class name is available during test time.
arXiv Detail & Related papers (2023-09-22T14:48:42Z) - FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets [69.91340332545094]
We introduce FLASK, a fine-grained evaluation protocol for both human-based and model-based evaluation.
We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance.
arXiv Detail & Related papers (2023-07-20T14:56:35Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Exemplar Free Class Agnostic Counting [28.41525571128706]
Class agnostic counting aims to count objects in a novel object category at test time without access to labeled training data for that category.
Our proposed approach first identifies exemplars from repeating objects in an image, and then counts the repeating objects.
We evaluate our proposed approach on FSC-147 dataset, and show that it achieves superior performance compared to the existing approaches.
arXiv Detail & Related papers (2022-05-27T19:44:39Z) - Learning to Count Anything: Reference-less Class-agnostic Counting with
Weak Supervision [11.037585450795357]
We show that counting is, at its core, a repetition-recognition task.
We demonstrate that self-supervised vision transformer features combined with a lightweight count regression head achieve competitive results.
arXiv Detail & Related papers (2022-05-20T14:26:38Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Incremental Few-Shot Object Detection [96.02543873402813]
OpeN-ended Centre nEt is a detector for incrementally learning to detect class objects with few examples.
ONCE fully respects the incremental learning paradigm, with novel class registration requiring only a single forward pass of few-shot training samples.
arXiv Detail & Related papers (2020-03-10T12:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.