A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting
- URL: http://arxiv.org/abs/2404.09826v2
- Date: Mon, 18 Nov 2024 14:52:09 GMT
- Title: A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting
- Authors: Tsung-Han Chou, Brian Wang, Wei-Chen Chiu, Jun-Cheng Chen,
- Abstract summary: Class counting (CAC) is a vision computation task that can be used to count the total occurrence number of any given reference objects in the query image.
Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image.
We introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme.
- Score: 27.439965991083177
- License:
- Abstract: Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. Moreover, the current evaluation metrics and dataset cannot be used to faithfully assess the model's generalization performance and robustness. To this end, we discover that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority (i.e. dominant objects) regardless of the references. Furthermore, we introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme and better benchmarking CAC models in a more fair manner. Besides, extensive evaluation results demonstrate that our proposed recipe can consistently improve the performance of different CAC models. The code is available at https://github.com/littlepenguin89106/MGCAC.
Related papers
- UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation.
We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z) - Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting [8.000723123087473]
Class-agnostic counting (CAC) is a recent task in computer vision that aims to estimate the number of instances of arbitrary object classes never seen during model training.
We introduce the Prompt-Aware Counting benchmark, which comprises two targeted tests, each accompanied by appropriate evaluation metrics.
arXiv Detail & Related papers (2024-09-24T10:35:42Z) - SQLNet: Scale-Modulated Query and Localization Network for Few-Shot
Class-Agnostic Counting [71.38754976584009]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image.
We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net)
It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z) - Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection [4.0208298639821525]
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community.
Recent studies show that adapting a pre-trained model or modified loss function can improve performance.
We propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN.
arXiv Detail & Related papers (2023-11-01T04:04:34Z) - Scalable Incomplete Multi-View Clustering with Structure Alignment [71.62781659121092]
In this paper, we propose a novel incomplete anchor graph learning framework.
We construct the view-specific anchor graph to capture the complementary information from different views.
The time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples.
arXiv Detail & Related papers (2023-08-31T08:30:26Z) - Mitigating Catastrophic Forgetting in Task-Incremental Continual
Learning with Adaptive Classification Criterion [50.03041373044267]
We propose a Supervised Contrastive learning framework with adaptive classification criterion for Continual Learning.
Experiments show that CFL achieves state-of-the-art performance and has a stronger ability to overcome compared with the classification baselines.
arXiv Detail & Related papers (2023-05-20T19:22:40Z) - GCNet: Probing Self-Similarity Learning for Generalized Counting Network [24.09746233447471]
Generalized Counting Network (GCNet) is developed to recognize adaptive exemplars within the whole images.
GCNet is capable of adaptively capturing them through a carefully-designed self-similarity learning strategy.
It performs on par with existing exemplar-dependent methods and shows stunning cross-dataset generality on crowd-specific datasets.
arXiv Detail & Related papers (2023-02-10T09:31:37Z) - Not All Instances Contribute Equally: Instance-adaptive Class
Representation Learning for Few-Shot Visual Recognition [94.04041301504567]
Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances.
We propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.
arXiv Detail & Related papers (2022-09-07T10:00:18Z) - TISE: A Toolbox for Text-to-Image Synthesis Evaluation [9.092600296992925]
We conduct a study on state-of-the-art methods for single- and multi-object text-to-image synthesis.
We propose a common framework for evaluating these methods.
arXiv Detail & Related papers (2021-12-02T16:39:35Z) - Unsupervised Person Re-identification via Softened Similarity Learning [122.70472387837542]
Person re-identification (re-ID) is an important topic in computer vision.
This paper studies the unsupervised setting of re-ID, which does not require any labeled information.
Experiments on two image-based and video-based datasets demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2020-04-07T17:16:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.