A Simple-but-effective Baseline for Training-free Class-Agnostic
Counting
- URL: http://arxiv.org/abs/2403.01418v1
- Date: Sun, 3 Mar 2024 07:19:50 GMT
- Title: A Simple-but-effective Baseline for Training-free Class-Agnostic
Counting
- Authors: Yuhao Lin, Haiming Xu, Lingqiao Liu, Javen Qinfeng Shi
- Abstract summary: Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples.
Recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models.
We present a training-free solution that effectively bridges this performance gap, serving as a strong baseline.
- Score: 30.792198686654075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class-Agnostic Counting (CAC) seeks to accurately count objects in a given
image with only a few reference examples. While previous methods achieving this
relied on additional training, recent efforts have shown that it's possible to
accomplish this without training by utilizing pre-existing foundation models,
particularly the Segment Anything Model (SAM), for counting via instance-level
segmentation. Although promising, current training-free methods still lag
behind their training-based counterparts in terms of performance. In this
research, we present a straightforward training-free solution that effectively
bridges this performance gap, serving as a strong baseline. The primary
contribution of our work lies in the discovery of four key technologies that
can enhance performance. Specifically, we suggest employing a superpixel
algorithm to generate more precise initial point prompts, utilizing an image
encoder with richer semantic knowledge to replace the SAM encoder for
representing candidate objects, and adopting a multiscale mechanism and a
transductive prototype scheme to update the representation of reference
examples. By combining these four technologies, our approach achieves
significant improvements over existing training-free methods and delivers
performance on par with training-based ones.
Related papers
- Task-Oriented Pre-Training for Drivable Area Detection [5.57325257338134]
We propose a task-oriented pre-training method that begins with generating redundant segmentation proposals.
We then introduce a Specific Category Enhancement Fine-tuning (SCEF) strategy for fine-tuning the Contrastive Language-Image Pre-training (CLIP) model.
This approach can generate a lot of coarse training data for pre-training models, which are further fine-tuned using manually annotated data.
arXiv Detail & Related papers (2024-09-30T10:25:47Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation [19.20874993309959]
vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks.
We propose a baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP)
Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature.
arXiv Detail & Related papers (2024-04-12T01:08:04Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain,
Active and Continual Few-Shot Learning [41.07029317930986]
We propose a variance-sensitive class of models that operates in a low-label regime.
The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier.
We further extend this approach to a transductive learning setting, proposing Transductive CNAPS.
arXiv Detail & Related papers (2022-01-13T18:59:02Z) - Label, Verify, Correct: A Simple Few Shot Object Detection Method [93.84801062680786]
We introduce a simple pseudo-labelling method to source high-quality pseudo-annotations from a training set.
We present two novel methods to improve the precision of the pseudo-labelling process.
Our method achieves state-of-the-art or second-best performance compared to existing approaches.
arXiv Detail & Related papers (2021-12-10T18:59:06Z) - SML: Semantic Meta-learning for Few-shot Semantic Segmentation [27.773396307292497]
We propose a novel meta-learning framework, Semantic Meta-Learning, which incorporates class-level semantic descriptions in the generated prototypes for this problem.
In addition, we propose to use the well established technique, ridge regression, to not only bring in the class-level semantic information, but also to effectively utilise the information available from multiple images present in the training data for prototype computation.
arXiv Detail & Related papers (2020-09-14T18:26:46Z) - A Deeper Look at Salient Object Detection: Bi-stream Network with a
Small Training Dataset [62.26677215668959]
We provide a feasible way to construct a novel small-scale training set, which only contains 4K images.
We propose a novel bi-stream network to take full advantage of our proposed small training set.
arXiv Detail & Related papers (2020-08-07T01:24:33Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.