Related papers: Generalized-Scale Object Counting with Gradual Query Aggregation

Generalized-Scale Object Counting with Gradual Query Aggregation

URL: http://arxiv.org/abs/2511.08048v1
Date: Wed, 12 Nov 2025 01:36:46 GMT
Title: Generalized-Scale Object Counting with Gradual Query Aggregation
Authors: Jer Pelhan, Alan Lukezic, Matej Kristan,
Abstract summary: GECO2 is an end-to-end few-shot counting and detection method that explicitly addresses the object scale issues.<n>It surpasses state-of-the-art few-shot counters in counting as well as detection accuracy by 10% while running 3x times faster at smaller GPU memory footprint.
Score: 18.582729412306346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Few-shot detection-based counters estimate the number of instances in the image specified only by a few test-time exemplars. A common approach to localize objects across multiple sizes is to merge backbone features of different resolutions. Furthermore, to enable small object detection in densely populated regions, the input image is commonly upsampled and tiling is applied to cope with the increased computational and memory requirements. Because of these ad-hoc solutions, existing counters struggle with images containing diverse-sized objects and densely populated regions of small objects. We propose GECO2, an end-to-end few-shot counting and detection method that explicitly addresses the object scale issues. A new dense query representation gradually aggregates exemplar-specific feature information across scales that leads to high-resolution dense queries that enable detection of large as well as small objects. GECO2 surpasses state-of-the-art few-shot counters in counting as well as detection accuracy by 10% while running 3x times faster at smaller GPU memory footprint.

Related papers

SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer [62.11796778482088]
We present a novel model-agnostic sparse vision transformer, dubbed SparseFormer, to bridge the gap of object detection between close-up and HRW shots.<n>The proposed SparseFormer selectively uses attentive tokens to scrutinize the sparsely distributed windows that may contain objects.<n> experiments on two HRW benchmarks, PANDA and DOTA-v1.0, demonstrate that the proposed SparseFormer significantly improves detection accuracy (up to 5.8%) and speed (up to 3x) over the state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-11T03:21:25Z)
DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting [10.461109095311546]
Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars in the image. Current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers.
arXiv Detail & Related papers (2024-04-25T14:07:52Z)
YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images [33.80392696735718]
YOLC (You Only Look Clusters) is an efficient and effective framework that builds on an anchor-free object detector, CenterNet. To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection. We perform extensive experiments on two aerial image datasets, including Visdrone 2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach.
arXiv Detail & Related papers (2024-04-09T10:03:44Z)
SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting [67.97870844244187]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image.<n>We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net)<n>It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z)
3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection. We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution. It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z)
De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z)
A Coarse to Fine Framework for Object Detection in High Resolution Image [8.316322664637537]
Current approaches of object detection seldom consider detecting tiny object or the large scale variance problem in high resolution images. We introduce a simple yet efficient approach that improves accuracy of object detection especially for small objects and large scale variance scene. Our approach can make good use of the sparsity of the objects and the information in high-resolution image, thereby making the detection more efficient.
arXiv Detail & Related papers (2023-03-02T13:04:33Z)
One-Shot General Object Localization [43.88712478006662]
OneLoc is a general one-shot object localization algorithm. OneLoc efficiently finds the object center and bounding box size by a special voting scheme. Experiments show that the proposed method achieves state-of-the-art overall performance on two datasets.
arXiv Detail & Related papers (2022-11-24T03:14:04Z)
Discovery-and-Selection: Towards Optimal Multiple Instance Learning for Weakly Supervised Object Detection [86.86602297364826]
We propose a discoveryand-selection approach fused with multiple instance learning (DS-MIL) Our proposed DS-MIL approach can consistently improve the baselines, reporting state-of-the-art performance.
arXiv Detail & Related papers (2021-10-18T07:06:57Z)
You Better Look Twice: a new perspective for designing accurate detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture. It reduces computations by separating objects from background using a very lite first-stage. Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z)
Localizing Grouped Instances for Efficient Detection in Low-Resource Scenarios [27.920304852537534]
We propose a novel flexible detection scheme that efficiently adapts to variable object sizes and densities. We rely on a sequence of detection stages, each of which has the ability to predict groups of objects as well as individuals. We report experimental results on two aerial image datasets, and show that the proposed method is as accurate yet computationally more efficient than standard single-shot detectors.
arXiv Detail & Related papers (2020-04-27T07:56:53Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.