Related papers: CountZES: Counting via Zero-Shot Exemplar Selection

CountZES: Counting via Zero-Shot Exemplar Selection

URL: http://arxiv.org/abs/2512.16415v1
Date: Thu, 18 Dec 2025 11:12:50 GMT
Title: CountZES: Counting via Zero-Shot Exemplar Selection
Authors: Muhammad Ibraheem Siddiqui, Muhammad Haris Khan,
Abstract summary: We propose CountZES, a training-free framework for object counting via zero-shot exemplar selection.<n>CountZES discovers diverse exemplars through three synergistic stages: Detection-Anchored Exemplar (DAE), Density-Guided Exemplar (DGE), and Feature-Consensus Exemplar (FCE)
Score: 22.69910219820086
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Object counting in complex scenes remains challenging, particularly in the zero-shot setting, where the goal is to count instances of unseen categories specified only by a class name. Existing zero-shot object counting (ZOC) methods that infer exemplars from text either rely on open-vocabulary detectors, which often yield multi-instance candidates, or on random patch sampling, which fails to accurately delineate object instances. To address this, we propose CountZES, a training-free framework for object counting via zero-shot exemplar selection. CountZES progressively discovers diverse exemplars through three synergistic stages: Detection-Anchored Exemplar (DAE), Density-Guided Exemplar (DGE), and Feature-Consensus Exemplar (FCE). DAE refines open-vocabulary detections to isolate precise single-instance exemplars. DGE introduces a density-driven, self-supervised paradigm to identify statistically consistent and semantically compact exemplars, while FCE reinforces visual coherence through feature-space clustering. Together, these stages yield a diverse, complementary exemplar set that balances textual grounding, count consistency, and feature representativeness. Experiments on diverse datasets demonstrate CountZES superior performance among ZOC methods while generalizing effectively across natural, aerial and medical domains.

Related papers

Improving Generalized Visual Grounding with Instance-aware Joint Learning [45.53531162436934]
Generalized visual grounding tasks are designed to accommodate multi-target and non-target scenarios.<n>We propose InstanceVG, a framework equipped with instance-aware capabilities to tackle both GREC and GRES.<n>To instantiate the framework, we assign each instance query a prior reference point, which also serves as an additional basis for target matching.
arXiv Detail & Related papers (2025-09-17T07:00:51Z)
Exploring Semantic Clustering and Similarity Search for Heterogeneous Traffic Scenario Graph [41.2584175136191]
We first propose an expressive and flexible heterogeneous,temporal graph model for representing traffic scenarios.<n>We then propose a self-supervised method to learn a universal embedding space for scenario graphs.<n>In particular, we implement contrastive learning alongside a bootstrapping-based approach and evaluate their suitability for the scenario space.
arXiv Detail & Related papers (2025-07-07T15:10:03Z)
ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation [1.3624495460189863]
Instance-to-Semantic Attention Control (ISAC) explicitly resolves incomplete instance formation and semantic entanglement.<n>ISAC achieves up to 52% average multi-class accuracy and 83% average multi-instance accuracy.
arXiv Detail & Related papers (2025-05-27T09:23:10Z)
One size doesn't fit all: Predicting the Number of Examples for In-Context Learning [16.712595387955574]
In-context learning (ICL) refers to the process of adding a small number of localized examples from a training set of labelled data to an LLM's prompt.<n>Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference.<n>Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.
arXiv Detail & Related papers (2024-03-11T03:28:13Z)
EipFormer: Emphasizing Instance Positions in 3D Instance Segmentation [51.996943482875366]
We present a novel Transformer-based architecture, EipFormer, which comprises progressive aggregation and dual position embedding. EipFormer achieves superior or comparable performance compared to state-of-the-art approaches.
arXiv Detail & Related papers (2023-12-09T16:08:47Z)
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point. With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information. Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z)
InsCon:Instance Consistency Feature Representation via Self-Supervised Learning [9.416267640069297]
We propose a new end-to-end self-supervised framework called InsCon, which is devoted to capturing multi-instance information. InsCon builds a targeted learning paradigm that applies multi-instance images as input, aligning the learned feature between corresponding instance views. On the other hand, InsCon introduces the pull and push of cell-instance, which utilizes cell consistency to enhance fine-grained feature representation.
arXiv Detail & Related papers (2022-03-15T07:09:00Z)
Learning to Detect Instance-level Salient Objects Using Complementary Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z)
Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding [72.9370352430965]
We propose a visual-semantic guided loss method for event detection in videos. Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability. An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
arXiv Detail & Related papers (2021-10-12T11:46:56Z)
K-Shot Contrastive Learning of Visual Features with Multiple Instance Augmentations [67.46036826589467]
$K$-Shot Contrastive Learning is proposed to investigate sample variations within individual instances. It aims to combine the advantages of inter-instance discrimination by learning discriminative features to distinguish between different instances. Experiment results demonstrate the proposed $K$-shot contrastive learning achieves superior performances to the state-of-the-art unsupervised methods.
arXiv Detail & Related papers (2020-07-27T04:56:41Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.