Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
- URL: http://arxiv.org/abs/2503.07038v1
- Date: Mon, 10 Mar 2025 08:27:02 GMT
- Title: Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
- Authors: Mihcael Green, Matan Levy, Issar Tzachor, Dvir Samuel, Nir Darshan, Rami Ben-Ari,
- Abstract summary: We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene.<n>Key challenge is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image.<n>We introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase.
- Score: 5.2337753974570616
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene. The key challenge in this setting is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image. In this paper, we first analyze the limitations of existing methods on this challenging task and then introduce new benchmarks to support SoIR evaluation. Next, we introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase. This is followed by a refinement process that leverages attention-based feature extraction with object masks, integrating them into a single unified image descriptor. Our MaO approach significantly outperforms existing retrieval methods and strong baselines, achieving notable improvements in both zero-shot and lightweight multi-object fine-tuning. We hope this work will lay the groundwork and inspire further research to enhance retrieval performance for this highly practical task.
Related papers
- Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.<n>We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.<n>We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval [6.493562178111347]
We propose a cross-modal image-text retrieval framework based on object-aware query perturbation''
In our proposed method, object-aware cross-modal image-text retrieval is possible while keeping the rich expressive power and retrieval performance of existing V&L models without additional fine-tuning.
arXiv Detail & Related papers (2024-07-17T06:42:14Z) - Few-shot Object Localization [37.347898735345574]
This paper defines a novel task named Few-Shot Object localization (FSOL)
It aims to achieve precise localization with limited samples.
This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images.
Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research.
arXiv Detail & Related papers (2024-03-19T05:50:48Z) - Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features [11.112981323262337]
We present a simple yet effective approach to object-centric open-vocabulary image retrieval.<n>Our approach aggregates dense embeddings extracted from CLIP into a compact representation.<n>We show the effectiveness of our scheme to the task by achieving significantly better results than global feature approaches on three datasets.
arXiv Detail & Related papers (2023-09-26T15:13:09Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Addressing Visual Search in Open and Closed Set Settings [8.928169373673777]
We present a method for predicting pixel-level objectness from a low resolution gist image.
We then use to select regions for performing object detection locally at high resolution.
Second, we propose a novel strategy for open-set visual search that seeks to find all instances of a target class which may be previously unseen.
arXiv Detail & Related papers (2020-12-11T17:21:28Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Toward unsupervised, multi-object discovery in large-scale image
collections [26.39475298878971]
This paper builds on the optimization approach of Vo et al.
We propose a novel saliency-based region proposal algorithm.
We exploit the inherent hierarchical structure of proposals as an effective regularizer.
arXiv Detail & Related papers (2020-07-06T11:43:47Z) - Compact Deep Aggregation for Set Retrieval [87.52470995031997]
We focus on retrieving images containing multiple faces from a large scale dataset of images.
Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities.
We show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that.
arXiv Detail & Related papers (2020-03-26T08:43:15Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.