Learning What NOT to Count
- URL: http://arxiv.org/abs/2504.11705v1
- Date: Wed, 16 Apr 2025 02:05:47 GMT
- Title: Learning What NOT to Count
- Authors: Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh,
- Abstract summary: Few/zero-shot object counting methods often struggle to distinguish between fine-grained categories.<n>We propose an annotation-free approach that enables the seamless integration of new fine-grained categories into existing few/zero-shot counting models.<n>Our approach introduces an attention prediction network that identifies fine-grained category boundaries trained using only synthetic pseudo-annotated data.
- Score: 17.581015609730017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few/zero-shot object counting methods reduce the need for extensive annotations but often struggle to distinguish between fine-grained categories, especially when multiple similar objects appear in the same scene. To address this limitation, we propose an annotation-free approach that enables the seamless integration of new fine-grained categories into existing few/zero-shot counting models. By leveraging latent generative models, we synthesize high-quality, category-specific crowded scenes, providing a rich training source for adapting to new categories without manual labeling. Our approach introduces an attention prediction network that identifies fine-grained category boundaries trained using only synthetic pseudo-annotated data. At inference, these fine-grained attention estimates refine the output of existing few/zero-shot counting networks. To benchmark our method, we further introduce the FGTC dataset, a taxonomy-specific fine-grained object counting dataset for natural images. Our method substantially enhances pre-trained state-of-the-art models on fine-grained taxon counting tasks, while using only synthetic data. Code and data to be released upon acceptance.
Related papers
- Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - A Generic Method for Fine-grained Category Discovery in Natural Language Texts [38.297873969795546]
We introduce a method that successfully detects fine-grained clusters of semantically similar texts guided by a novel objective function.<n>The method uses semantic similarities in a logarithmic space to guide sample distributions in the Euclidean space.<n>We also propose a centroid inference mechanism to support real-time applications.
arXiv Detail & Related papers (2024-06-18T23:27:46Z) - Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing [38.84431954053434]
Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all.
We propose a simple and effective strategy for few-shot and zero-shot text classification.
arXiv Detail & Related papers (2024-05-06T15:38:32Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Coarse2Fine: Fine-grained Text Classification on Coarsely-grained
Annotated Data [22.81068960545234]
We introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data.
Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance.
Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement.
arXiv Detail & Related papers (2021-09-22T17:29:01Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine
Pseudo-Labeling with Visual-Semantic Meta-Embedding [13.063136901934865]
Few-shot learning aims at rapidly adapting to novel categories with only a handful of samples at test time.
In this paper, we advance the few-shot classification paradigm towards a more challenging scenario, i.e., cross-granularity few-shot classification.
We approximate the fine-grained data distribution by greedy clustering of each coarse-class into pseudo-fine-classes according to the similarity of image embeddings.
arXiv Detail & Related papers (2020-07-11T03:44:21Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.