Data Roaming and Quality Assessment for Composed Image Retrieval
- URL: http://arxiv.org/abs/2303.09429v2
- Date: Wed, 20 Dec 2023 11:07:57 GMT
- Title: Data Roaming and Quality Assessment for Composed Image Retrieval
- Authors: Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski
- Abstract summary: Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively.
We introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones.
We also introduce a new CoIR baseline, the Cross-Attention driven Shift (CASE)
- Score: 25.452015862927766
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The task of Composed Image Retrieval (CoIR) involves queries that combine
image and text modalities, allowing users to express their intent more
effectively. However, current CoIR datasets are orders of magnitude smaller
compared to other vision and language (V&L) datasets. Additionally, some of
these datasets have noticeable issues, such as queries containing redundant
modalities. To address these shortcomings, we introduce the Large Scale
Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times
larger than existing ones. Pre-training on our LaSCo, shows a noteworthy
improvement in performance, even in zero-shot. Furthermore, we propose a new
approach for analyzing CoIR datasets and methods, which detects modality
redundancy or necessity, in queries. We also introduce a new CoIR baseline, the
Cross-Attention driven Shift Encoder (CASE). This baseline allows for early
fusion of modalities using a cross-attention module and employs an additional
auxiliary task during training. Our experiments demonstrate that this new
baseline outperforms the current state-of-the-art methods on established
benchmarks like FashionIQ and CIRR.
Related papers
- iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval [26.101116761577796]
Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption.
We introduce a new task, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeled training dataset.
We present an open-domain benchmarking dataset named CIRCO, where each query is labeled with multiple ground truths and a semantic categorization.
arXiv Detail & Related papers (2024-05-05T14:39:06Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Zero-Shot Composed Image Retrieval with Textual Inversion [28.513594970580396]
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption.
We propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset.
arXiv Detail & Related papers (2023-03-27T14:31:25Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - The Little W-Net That Could: State-of-the-Art Retinal Vessel
Segmentation with Minimalistic Models [19.089445797922316]
We show that a minimalistic version of a standard U-Net with several orders of magnitude less parameters closely approximates the performance of current best techniques.
We also propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets.
We also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art.
arXiv Detail & Related papers (2020-09-03T19:59:51Z) - On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews,
Guidances and Million-AID [57.71601467271486]
This article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation.
We first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations.
Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset.
arXiv Detail & Related papers (2020-06-22T17:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.