Just Say the Word: Annotation-Free Fine-Grained Object Counting
- URL: http://arxiv.org/abs/2504.11705v3
- Date: Thu, 11 Sep 2025 21:18:08 GMT
- Title: Just Say the Word: Annotation-Free Fine-Grained Object Counting
- Authors: Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh,
- Abstract summary: Fine-grained object counting remains a major challenge for class-agnostic counting models.<n>We propose an alternative paradigm: Given a category name, tune a compact concept embedding from the prompt using synthetic images and pseudo-labels.<n>This embedding conditions a specialization module that refines raw overcounts from any frozen counter into accurate, category-specific estimates.
- Score: 22.31750687552324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained object counting remains a major challenge for class-agnostic counting models, which overcount visually similar but incorrect instances (e.g., jalape\~no vs. poblano). Addressing this by annotating new data and fully retraining the model is time-consuming and does not guarantee generalization to additional novel categories at test time. Instead, we propose an alternative paradigm: Given a category name, tune a compact concept embedding derived from the prompt using synthetic images and pseudo-labels generated by a text-to-image diffusion model. This embedding conditions a specialization module that refines raw overcounts from any frozen counter into accurate, category-specific estimates\textemdash without requiring real images or human annotations. We validate our approach on \textsc{Lookalikes}, a challenging new benchmark containing 1,037 images across 27 fine-grained subcategories, and show substantial improvements over strong baselines. Code will be released upon acceptance. Dataset - https://dalessandro.dev/datasets/lookalikes/
Related papers
- Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings [65.31723739561151]
This work stems from an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within encoded semantics.<n>We introduce a new evaluation dataset, CapRetrieval, in which passages are image captions and queries are phrases targeting entity or event concepts in diverse forms.<n>We finetune encoders with our proposed data generation strategies, enabling a small 0.1B encoder to outperform the state-of-the-art 7B model.
arXiv Detail & Related papers (2025-06-10T09:00:33Z) - Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - A Generic Method for Fine-grained Category Discovery in Natural Language Texts [38.297873969795546]
We introduce a method that successfully detects fine-grained clusters of semantically similar texts guided by a novel objective function.<n>The method uses semantic similarities in a logarithmic space to guide sample distributions in the Euclidean space.<n>We also propose a centroid inference mechanism to support real-time applications.
arXiv Detail & Related papers (2024-06-18T23:27:46Z) - Understanding Visual Concepts Across Models [45.18188726287581]
We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification.
We find perturbations within an $epsilon$-ball to any prior embedding that generate, detect, and classify an arbitrary concept.
When these new embeddings are spliced into new models, fine-tuning that targets the original model is lost.
arXiv Detail & Related papers (2024-06-11T17:40:31Z) - Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing [38.84431954053434]
Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all.
We propose a simple and effective strategy for few-shot and zero-shot text classification.
arXiv Detail & Related papers (2024-05-06T15:38:32Z) - Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - SynCDR : Training Cross Domain Retrieval Models with Synthetic Data [69.26882668598587]
In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains.
We show how to generate synthetic data to fill in these missing category examples across domains.
Our best SynCDR model can outperform prior art by up to 15%.
arXiv Detail & Related papers (2023-12-31T08:06:53Z) - Semantic Generative Augmentations for Few-Shot Counting [0.0]
We investigate how synthetic data can benefit few-shot class-agnostic counting.
We propose to rely on a double conditioning of Stable Diffusion with both a prompt and a density map.
Our experiments show that our diversified generation strategy significantly improves the counting accuracy of two recent and performing few-shot counting models.
arXiv Detail & Related papers (2023-10-26T11:42:48Z) - Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot
Instance Segmentation [13.001629605405954]
Zero-shot instance segmentation aims to detect and precisely segment objects of unseen categories without any training samples.
We propose D$2$Zero with Semantic-Promoted Debiasing and Background Disambiguation.
Background disambiguation produces image-adaptive background representation to avoid mistaking novel objects for background.
arXiv Detail & Related papers (2023-05-22T16:00:01Z) - Incremental Generalized Category Discovery [26.028970894707204]
We explore the problem of Incremental Generalized Category Discovery (IGCD)
This is a challenging category incremental learning setting where the goal is to develop models that can correctly categorize images from previously seen categories.
We present a new method for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting.
arXiv Detail & Related papers (2023-04-27T16:27:11Z) - What does a platypus look like? Generating customized prompts for
zero-shot image classification [52.92839995002636]
This work introduces a simple method to generate higher accuracy prompts without relying on any explicit knowledge of the task domain.
We leverage the knowledge contained in large language models (LLMs) to generate many descriptive sentences that contain important discriminating characteristics of the image categories.
This approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet.
arXiv Detail & Related papers (2022-09-07T17:27:08Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z) - Coarse2Fine: Fine-grained Text Classification on Coarsely-grained
Annotated Data [22.81068960545234]
We introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data.
Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance.
Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement.
arXiv Detail & Related papers (2021-09-22T17:29:01Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Few Shot Learning With No Labels [28.91314299138311]
Few-shot learners aim to recognize new categories given only a small number of training samples.
The core challenge is to avoid overfitting to the limited data while ensuring good generalization to novel classes.
Existing literature makes use of vast amounts of annotated data by simply shifting the label requirement from novel classes to base classes.
arXiv Detail & Related papers (2020-12-26T14:40:12Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine
Pseudo-Labeling with Visual-Semantic Meta-Embedding [13.063136901934865]
Few-shot learning aims at rapidly adapting to novel categories with only a handful of samples at test time.
In this paper, we advance the few-shot classification paradigm towards a more challenging scenario, i.e., cross-granularity few-shot classification.
We approximate the fine-grained data distribution by greedy clustering of each coarse-class into pseudo-fine-classes according to the similarity of image embeddings.
arXiv Detail & Related papers (2020-07-11T03:44:21Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.