Related papers: A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

URL: http://arxiv.org/abs/2404.09826v2
Date: Mon, 18 Nov 2024 14:52:09 GMT
Title: A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting
Authors: Tsung-Han Chou, Brian Wang, Wei-Chen Chiu, Jun-Cheng Chen,
Abstract summary: Class counting (CAC) is a vision computation task that can be used to count the total occurrence number of any given reference objects in the query image. Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. We introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme.
Score: 27.439965991083177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. Moreover, the current evaluation metrics and dataset cannot be used to faithfully assess the model's generalization performance and robustness. To this end, we discover that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority (i.e. dominant objects) regardless of the references. Furthermore, we introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme and better benchmarking CAC models in a more fair manner. Besides, extensive evaluation results demonstrate that our proposed recipe can consistently improve the performance of different CAC models. The code is available at https://github.com/littlepenguin89106/MGCAC.

Related papers

On Large Multimodal Models as Open-World Image Classifiers [71.78089106671581]
Large Multimodal Models (LMMs) can classifying images directly using natural language. We evaluate 13 models across 10 benchmarks, encompassing prototypical, non-prototypical, fine-grained, and very fine-grained classes.
arXiv Detail & Related papers (2025-03-27T17:03:18Z)
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches [6.356364436395916]
We present the first comprehensive review of class-agnostic counting (CAC) methodologies. We propose a taxonomy to categorize CAC approaches into three paradigms: reference-based, reference-less, and open-world text-guided. We present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities.
arXiv Detail & Related papers (2025-01-31T14:47:09Z)
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation. Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation. We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z)
Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting [8.000723123087473]
Class-agnostic counting (CAC) is a recent task in computer vision that aims to estimate the number of instances of arbitrary object classes never seen during model training. We introduce the Prompt-Aware Counting benchmark, which comprises two targeted tests, each accompanied by appropriate evaluation metrics.
arXiv Detail & Related papers (2024-09-24T10:35:42Z)
SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting [71.38754976584009]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image. We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net) It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z)
Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection [4.0208298639821525]
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. We propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN.
arXiv Detail & Related papers (2023-11-01T04:04:34Z)
Scalable Incomplete Multi-View Clustering with Structure Alignment [71.62781659121092]
In this paper, we propose a novel incomplete anchor graph learning framework. We construct the view-specific anchor graph to capture the complementary information from different views. The time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples.
arXiv Detail & Related papers (2023-08-31T08:30:26Z)
Mitigating Catastrophic Forgetting in Task-Incremental Continual Learning with Adaptive Classification Criterion [50.03041373044267]
We propose a Supervised Contrastive learning framework with adaptive classification criterion for Continual Learning. Experiments show that CFL achieves state-of-the-art performance and has a stronger ability to overcome compared with the classification baselines.
arXiv Detail & Related papers (2023-05-20T19:22:40Z)
GCNet: Probing Self-Similarity Learning for Generalized Counting Network [24.09746233447471]
Generalized Counting Network (GCNet) is developed to recognize adaptive exemplars within the whole images. GCNet is capable of adaptively capturing them through a carefully-designed self-similarity learning strategy. It performs on par with existing exemplar-dependent methods and shows stunning cross-dataset generality on crowd-specific datasets.
arXiv Detail & Related papers (2023-02-10T09:31:37Z)
Not All Instances Contribute Equally: Instance-adaptive Class Representation Learning for Few-Shot Visual Recognition [94.04041301504567]
Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances. We propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.
arXiv Detail & Related papers (2022-09-07T10:00:18Z)
TISE: A Toolbox for Text-to-Image Synthesis Evaluation [9.092600296992925]
We conduct a study on state-of-the-art methods for single- and multi-object text-to-image synthesis. We propose a common framework for evaluating these methods.
arXiv Detail & Related papers (2021-12-02T16:39:35Z)
Unsupervised Person Re-identification via Softened Similarity Learning [122.70472387837542]
Person re-identification (re-ID) is an important topic in computer vision. This paper studies the unsupervised setting of re-ID, which does not require any labeled information. Experiments on two image-based and video-based datasets demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2020-04-07T17:16:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.