Related papers: CoDi -- an exemplar-conditioned diffusion model for low-shot counting

CoDi -- an exemplar-conditioned diffusion model for low-shot counting

URL: http://arxiv.org/abs/2512.20153v1
Date: Tue, 23 Dec 2025 08:31:36 GMT
Title: CoDi -- an exemplar-conditioned diffusion model for low-shot counting
Authors: Grega Šuštar, Jer Pelhan, Alan Lukežič, Matej Kristan,
Abstract summary: Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no test-time exemplars.<n>We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression.
Score: 11.459105904251507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern low-shot counters are dense regions with small objects. While total counts in such situations are typically well addressed by density-based counters, their usefulness is limited by poor localization capabilities. This is better addressed by point-detection-based counters, which are based on query-based detectors. However, due to limited number of pre-trained queries, they underperform on images with very large numbers of objects, and resort to ad-hoc techniques like upsampling and tiling. We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression. Our core contribution is the new exemplar-based conditioning module that extracts and adjusts the object prototypes to the intermediate layers of the denoising network, leading to accurate object location estimation. On FSC benchmark, CoDi outperforms state-of-the-art by 15% MAE, 13% MAE and 10% MAE in the few-shot, one-shot, and reference-less scenarios, respectively, and sets a new state-of-the-art on MCAC benchmark by outperforming the top method by 44% MAE. The code is available at https://github.com/gsustar/CoDi.

Related papers

Generalized-Scale Object Counting with Gradual Query Aggregation [18.582729412306346]
GECO2 is an end-to-end few-shot counting and detection method that explicitly addresses the object scale issues.<n>It surpasses state-of-the-art few-shot counters in counting as well as detection accuracy by 10% while running 3x times faster at smaller GPU memory footprint.
arXiv Detail & Related papers (2025-11-11T09:52:27Z)
A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation [10.461109095311546]
Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars.<n>The existing approaches often lead to overgeneralization and false positive detections.<n>We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation.
arXiv Detail & Related papers (2024-09-27T12:20:29Z)
DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting [10.461109095311546]
Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars in the image. Current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers.
arXiv Detail & Related papers (2024-04-25T14:07:52Z)
Point, Segment and Count: A Generalized Framework for Object Counting [40.192374437785155]
Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names. We propose a generalized framework for both few-shot and zero-shot object counting based on detection. PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection.
arXiv Detail & Related papers (2023-11-21T06:55:21Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
One-Shot General Object Localization [43.88712478006662]
OneLoc is a general one-shot object localization algorithm. OneLoc efficiently finds the object center and bounding box size by a special voting scheme. Experiments show that the proposed method achieves state-of-the-art overall performance on two datasets.
arXiv Detail & Related papers (2022-11-24T03:14:04Z)
A Low-Shot Object Counting Network With Iterative Prototype Adaptation [14.650207945870598]
We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot) Existing methods extract queries by feature pooling which neglects the shape information (e.g., size and aspect) and leads to a reduced object localization accuracy and count estimates. We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA)
arXiv Detail & Related papers (2022-11-15T15:39:23Z)
LDC-Net: A Unified Framework for Localization, Detection and Counting in Dense Crowds [103.8635206945196]
The rapid development in visual crowd analysis shows a trend to count people by positioning or even detecting, rather than simply summing a density map. Some recent work on crowd localization and detection has two limitations: 1) The typical detection methods can not handle the dense crowds and a large variation in scale; 2) The density map methods suffer from performance deficiency in position and box prediction, especially in high density or large-size crowds.
arXiv Detail & Related papers (2021-10-10T07:55:44Z)
A Self-Training Approach for Point-Supervised Object Detection and Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations. During training, we utilize the available point annotations to supervise the estimation of the center points of objects. Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z)
Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences. We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)
Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z)
Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.