Few-shot Object Localization
- URL: http://arxiv.org/abs/2403.12466v3
- Date: Wed, 5 Jun 2024 08:10:26 GMT
- Title: Few-shot Object Localization
- Authors: Yunhan Ren, Bo Li, Chengyang Zhang, Yong Zhang, Baocai Yin,
- Abstract summary: This paper defines a novel task named Few-Shot Object localization (FSOL)
It aims to achieve precise localization with limited samples.
This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images.
Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research.
- Score: 37.347898735345574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing object localization methods are tailored to locate specific classes of objects, relying heavily on abundant labeled data for model optimization. However, acquiring large amounts of labeled data is challenging in many real-world scenarios, significantly limiting the broader application of localization models. To bridge this research gap, this paper defines a novel task named Few-Shot Object Localization (FSOL), which aims to achieve precise localization with limited samples. This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images. To advance this field, we design an innovative high-performance baseline model. This model integrates a dual-path feature augmentation module to enhance shape association and gradient differences between supports and query images, alongside a self query module to explore the association between feature maps and query images. Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research. All codes and data are available at https://github.com/Ryh1218/FSOL.
Related papers
- Efficient Feature Fusion for UAV Object Detection [9.632727117779178]
Small objects, in particular, occupy small portions of images, making their accurate detection difficult.
Existing multi-scale feature fusion methods address these challenges by aggregating features across different resolutions.
We propose a novel feature fusion framework specifically designed for UAV object detection tasks.
arXiv Detail & Related papers (2025-01-29T20:39:16Z) - Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models [7.898092154590899]
Salient Object Detection aims to identify and segment prominent regions within a scene.
Traditional models rely on manually annotated pseudo labels with precise pixel-level accuracy.
We develop a low-cost, high-precision annotation method to address the challenges.
arXiv Detail & Related papers (2025-01-08T15:56:21Z) - RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations [55.74675012171316]
RELOCATE is a training-free baseline designed to perform the challenging task of visual query localization in long videos.
To eliminate the need for task-specific training, RELOCATE leverages a region-based representation derived from pretrained vision models.
arXiv Detail & Related papers (2024-12-02T18:59:53Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding [42.10086029931937]
Visual grounding aims to localize the object referred to in an image based on a natural language query.
Existing methods demonstrate a significant performance drop when there are multiple distractions in an image.
We propose a novel approach, the Relation and Semantic-sensitive Visual Grounding (ResVG) model, to address this issue.
arXiv Detail & Related papers (2024-08-29T07:32:01Z) - SQLNet: Scale-Modulated Query and Localization Network for Few-Shot
Class-Agnostic Counting [71.38754976584009]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image.
We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net)
It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.