A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
- URL: http://arxiv.org/abs/2501.19184v2
- Date: Mon, 10 Feb 2025 15:47:32 GMT
- Title: A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
- Authors: Luca Ciampi, Ali Azmoudeh, Elif Ecem Akbaba, Erdi Sarıtaş, Ziya Ata Yazıcı, Hazım Kemal Ekenel, Giuseppe Amato, Fabrizio Falchi,
- Abstract summary: We present the first comprehensive review of class-agnostic counting (CAC) methodologies.
We propose a taxonomy to categorize CAC approaches into three paradigms: reference-based, reference-less, and open-world text-guided.
We present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities.
- Score: 6.356364436395916
- License:
- Abstract: Visual object counting has recently shifted towards class-agnostic counting (CAC), which addresses the challenge of counting objects across arbitrary categories -- a crucial capability for flexible and generalizable counting systems. Unlike humans, who effortlessly identify and count objects from diverse categories without prior knowledge, most existing counting methods are restricted to enumerating instances of known classes, requiring extensive labeled datasets for training and struggling in open-vocabulary settings. In contrast, CAC aims to count objects belonging to classes never seen during training, operating in a few-shot setting. In this paper, we present the first comprehensive review of CAC methodologies. We propose a taxonomy to categorize CAC approaches into three paradigms based on how target object classes can be specified: reference-based, reference-less, and open-world text-guided. Reference-based approaches achieve state-of-the-art performance by relying on exemplar-guided mechanisms. Reference-less methods eliminate exemplar dependency by leveraging inherent image patterns. Finally, open-world text-guided methods use vision-language models, enabling object class descriptions via textual prompts, offering a flexible and promising solution. Based on this taxonomy, we provide an overview of the architectures of 29 CAC approaches and report their results on gold-standard benchmarks. We compare their performance and discuss their strengths and limitations. Specifically, we present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities. Finally, we offer a critical discussion of persistent challenges, such as annotation dependency and generalization, alongside future directions. We believe this survey will be a valuable resource, showcasing CAC advancements and guiding future research.
Related papers
- Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting [8.000723123087473]
Class-agnostic counting (CAC) counts instances of arbitrary object classes never seen during model training.
We introduce the Prompt-Aware Counting benchmark to measure the robustness and trustworthiness of prompt-based CAC models.
We evaluate state-of-the-art methods and demonstrate that, although some achieve impressive results on standard class-specific counting metrics, they exhibit a significant deficiency in understanding the input prompt.
arXiv Detail & Related papers (2024-09-24T10:35:42Z) - Contextuality Helps Representation Learning for Generalized Category Discovery [5.885208652383516]
This paper introduces a novel approach to Generalized Category Discovery (GCD) by leveraging the concept of contextuality.
Our model integrates two levels of contextuality: instance-level, where nearest-neighbor contexts are utilized for contrastive learning, and cluster-level, employing contrastive learning.
The integration of the contextual information effectively improves the feature learning and thereby the classification accuracy of all categories.
arXiv Detail & Related papers (2024-07-29T07:30:41Z) - Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy [27.454549324141087]
We propose a novel VQA benchmark based on well-known visual classification datasets.
We also suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category.
Our contributions aim to lay the foundation for more precise and meaningful assessments.
arXiv Detail & Related papers (2024-02-11T18:26:18Z) - An Evaluation Framework for Mapping News Headlines to Event Classes in a
Knowledge Graph [3.9742873618618275]
We present a methodology for creating a benchmark dataset of news headlines mapped to event classes in Wikidata.
We use the dataset to study two classes of unsupervised methods for this task.
We present the results of our evaluation, lessons learned, and directions for future work.
arXiv Detail & Related papers (2023-12-04T20:42:26Z) - Leveraging Knowledge Graphs for Zero-Shot Object-agnostic State
Classification [1.6582445398167214]
We propose the first Object-agnostic State Classification (OaSC) method that infers the state of a certain object without relying on the knowledge or the estimation of the object class.
A series of experiments investigate the performance of the proposed method in various settings.
The proposed OaSC method outperforms existing methods in all datasets and benchmarks by a great margin.
arXiv Detail & Related papers (2023-07-22T22:19:11Z) - Multi-Modal Classifiers for Open-Vocabulary Object Detection [104.77331131447541]
The goal of this paper is open-vocabulary object detection (OVOD)
We adopt a standard two-stage object detector architecture.
We explore three ways via: language descriptions, image exemplars, or a combination of the two.
arXiv Detail & Related papers (2023-06-08T18:31:56Z) - Class-incremental Novel Class Discovery [76.35226130521758]
We study the new task of class-incremental Novel Class Discovery (class-iNCD)
We propose a novel approach for class-iNCD which prevents forgetting of past information about the base classes.
Our experiments, conducted on three common benchmarks, demonstrate that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-18T13:49:27Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - One-Class Classification: A Survey [96.17410674315816]
One-Class Classification (OCC) is a special case of multi-class classification, where data observed during training is from a single positive class.
We provide a survey of classical statistical and recent deep learning-based OCC methods for visual recognition.
arXiv Detail & Related papers (2021-01-08T15:30:29Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.