CoDi -- an exemplar-conditioned diffusion model for low-shot counting
- URL: http://arxiv.org/abs/2512.20153v1
- Date: Tue, 23 Dec 2025 08:31:36 GMT
- Title: CoDi -- an exemplar-conditioned diffusion model for low-shot counting
- Authors: Grega Šuštar, Jer Pelhan, Alan Lukežič, Matej Kristan,
- Abstract summary: Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no test-time exemplars.<n>We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression.
- Score: 11.459105904251507
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern low-shot counters are dense regions with small objects. While total counts in such situations are typically well addressed by density-based counters, their usefulness is limited by poor localization capabilities. This is better addressed by point-detection-based counters, which are based on query-based detectors. However, due to limited number of pre-trained queries, they underperform on images with very large numbers of objects, and resort to ad-hoc techniques like upsampling and tiling. We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression. Our core contribution is the new exemplar-based conditioning module that extracts and adjusts the object prototypes to the intermediate layers of the denoising network, leading to accurate object location estimation. On FSC benchmark, CoDi outperforms state-of-the-art by 15% MAE, 13% MAE and 10% MAE in the few-shot, one-shot, and reference-less scenarios, respectively, and sets a new state-of-the-art on MCAC benchmark by outperforming the top method by 44% MAE. The code is available at https://github.com/gsustar/CoDi.
Related papers
- Generalized-Scale Object Counting with Gradual Query Aggregation [18.582729412306346]
GECO2 is an end-to-end few-shot counting and detection method that explicitly addresses the object scale issues.<n>It surpasses state-of-the-art few-shot counters in counting as well as detection accuracy by 10% while running 3x times faster at smaller GPU memory footprint.
arXiv Detail & Related papers (2025-11-11T09:52:27Z) - A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation [10.461109095311546]
Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars.<n>The existing approaches often lead to overgeneralization and false positive detections.<n>We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation.
arXiv Detail & Related papers (2024-09-27T12:20:29Z) - DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting [10.461109095311546]
Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars in the image.
Current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes.
We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers.
arXiv Detail & Related papers (2024-04-25T14:07:52Z) - Point, Segment and Count: A Generalized Framework for Object Counting [40.192374437785155]
Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names.
We propose a generalized framework for both few-shot and zero-shot object counting based on detection.
PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection.
arXiv Detail & Related papers (2023-11-21T06:55:21Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - One-Shot General Object Localization [43.88712478006662]
OneLoc is a general one-shot object localization algorithm.
OneLoc efficiently finds the object center and bounding box size by a special voting scheme.
Experiments show that the proposed method achieves state-of-the-art overall performance on two datasets.
arXiv Detail & Related papers (2022-11-24T03:14:04Z) - A Low-Shot Object Counting Network With Iterative Prototype Adaptation [14.650207945870598]
We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot)
Existing methods extract queries by feature pooling which neglects the shape information (e.g., size and aspect) and leads to a reduced object localization accuracy and count estimates.
We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA)
arXiv Detail & Related papers (2022-11-15T15:39:23Z) - LDC-Net: A Unified Framework for Localization, Detection and Counting in
Dense Crowds [103.8635206945196]
The rapid development in visual crowd analysis shows a trend to count people by positioning or even detecting, rather than simply summing a density map.
Some recent work on crowd localization and detection has two limitations: 1) The typical detection methods can not handle the dense crowds and a large variation in scale; 2) The density map methods suffer from performance deficiency in position and box prediction, especially in high density or large-size crowds.
arXiv Detail & Related papers (2021-10-10T07:55:44Z) - A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations.
During training, we utilize the available point annotations to supervise the estimation of the center points of objects.
Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Rethinking Localization Map: Towards Accurate Object Perception with
Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision.
In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z) - Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task.
Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.