Related papers: Point, Segment and Count: A Generalized Framework for Object Counting

Point, Segment and Count: A Generalized Framework for Object Counting

URL: http://arxiv.org/abs/2311.12386v3
Date: Wed, 27 Mar 2024 15:01:44 GMT
Title: Point, Segment and Count: A Generalized Framework for Object Counting
Authors: Zhizhong Huang, Mingliang Dai, Yi Zhang, Junping Zhang, Hongming Shan,
Abstract summary: Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names. We propose a generalized framework for both few-shot and zero-shot object counting based on detection. PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection.
Score: 40.192374437785155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names, \emph{a.k.a} few-shot and zero-shot counting. In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection. Our framework combines the superior advantages of two foundation models without compromising their zero-shot capability: (\textbf{i}) SAM to segment all possible objects as mask proposals, and (\textbf{ii}) CLIP to classify proposals to obtain accurate object counts. However, this strategy meets the obstacles of efficiency overhead and the small crowded objects that cannot be localized and distinguished. To address these issues, our framework, termed PseCo, follows three steps: point, segment, and count. Specifically, we first propose a class-agnostic object localization to provide accurate but least point prompts for SAM, which consequently not only reduces computation costs but also avoids missing small objects. Furthermore, we propose a generalized object classification that leverages CLIP image/text embeddings as the classifier, following a hierarchical knowledge distillation to obtain discriminative classifications among hierarchical mask proposals. Extensive experimental results on FSC-147, COCO, and LVIS demonstrate that PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection. Code: https://github.com/Hzzone/PseCo

Related papers

Bootstrapping MLLM for Weakly-Supervised Class-Agnostic Object Counting [59.37613121962146]
We propose WS-COC, the first MLLM-driven weakly-supervised framework for class-agnostic object counting.<n> WS-COC matches or even surpasses many state-of-art fully-supervised methods while significantly reducing annotation costs.
arXiv Detail & Related papers (2026-02-13T09:58:35Z)
OCCAM: Class-Agnostic, Training-Free, Prior-Free and Multi-Class Object Counting [1.2196508752999795]
Class-Agnostic object Counting (CAC) involves counting instances of objects from arbitrary classes within an image.<n>We present OCCAM, the first training-free approach to CAC that operates without the need of any supplementary information.
arXiv Detail & Related papers (2026-01-20T11:36:38Z)
Are We Done with Object-Centric Learning? [65.67948794110212]
Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. With recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. We address the OOD generalization challenge caused by spurious background cues through the lens of OCL.
arXiv Detail & Related papers (2025-04-09T17:59:05Z)
What is Point Supervision Worth in Video Instance Segmentation? [119.71921319637748]
Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Comprehensive experiments on three VIS benchmarks demonstrate competitive performance of the proposed framework, nearly matching fully supervised methods.
arXiv Detail & Related papers (2024-04-01T17:38:25Z)
Zero-Shot Object Counting with Language-Vision Models [50.1159882903028]
Class-agnostic object counting aims to count object instances of an arbitrary class at test time. Current methods require human-annotated exemplars as inputs which are often unavailable for novel categories. We propose zero-shot object counting (ZSC), a new setting where only the class name is available during test time.
arXiv Detail & Related papers (2023-09-22T14:48:42Z)
Learning from Pseudo-labeled Segmentation for Multi-Class Object Counting [35.652092907690694]
Class-agnostic counting (CAC) has numerous potential applications across various domains. The goal is to count objects of an arbitrary category during testing, based on only a few annotated exemplars. We show that the segmentation model trained on these pseudo-labeled masks can effectively localize objects of interest for an arbitrary multi-class image.
arXiv Detail & Related papers (2023-07-15T01:33:19Z)
Disambiguation of One-Shot Visual Classification Tasks: A Simplex-Based Approach [8.436437583394998]
We present a strategy which aims at detecting the presence of multiple objects in a given shot. This strategy is based on identifying the corners of a simplex in a high dimensional space. We show the ability of the proposed method to slightly, yet statistically significantly, improve accuracy in extreme settings.
arXiv Detail & Related papers (2023-01-16T11:37:05Z)
Few-shot Object Counting and Detection [25.61294147822642]
We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. We introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR.
arXiv Detail & Related papers (2022-07-22T10:09:18Z)
Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation. Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z)
Integrative Few-Shot Learning for Classification and Segmentation [37.50821005917126]
We introduce the integrative task of few-shot classification and segmentation (FS-CS) FS-CS aims to classify and segment target objects in a query image when the target classes are given with a few examples. We propose the integrative few-shot learning framework for FS-CS, which trains a learner to construct class-wise foreground maps.
arXiv Detail & Related papers (2022-03-29T16:14:40Z)
Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation. We introduce a novel approach for more accurate and efficient unseen-temporal segmentation. We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting [18.733301622920102]
Multi-class object counting expands the scope of application of object counting task. The multi-target detection task can achieve multi-class object counting in some scenarios. We propose a simple yet efficient counting network based on point-level annotations.
arXiv Detail & Related papers (2020-12-15T08:38:28Z)
Corner Proposal Network for Anchor-free, Two-stage Object Detection [174.59360147041673]
The goal of object detection is to determine the class and location of objects in an image. This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals. We demonstrate that these two stages are effective solutions for improving recall and precision.
arXiv Detail & Related papers (2020-07-27T19:04:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.