Data Roaming and Quality Assessment for Composed Image Retrieval
- URL: http://arxiv.org/abs/2303.09429v2
- Date: Wed, 20 Dec 2023 11:07:57 GMT
- Title: Data Roaming and Quality Assessment for Composed Image Retrieval
- Authors: Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski
- Abstract summary: Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively.
We introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones.
We also introduce a new CoIR baseline, the Cross-Attention driven Shift (CASE)
- Score: 25.452015862927766
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The task of Composed Image Retrieval (CoIR) involves queries that combine
image and text modalities, allowing users to express their intent more
effectively. However, current CoIR datasets are orders of magnitude smaller
compared to other vision and language (V&L) datasets. Additionally, some of
these datasets have noticeable issues, such as queries containing redundant
modalities. To address these shortcomings, we introduce the Large Scale
Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times
larger than existing ones. Pre-training on our LaSCo, shows a noteworthy
improvement in performance, even in zero-shot. Furthermore, we propose a new
approach for analyzing CoIR datasets and methods, which detects modality
redundancy or necessity, in queries. We also introduce a new CoIR baseline, the
Cross-Attention driven Shift Encoder (CASE). This baseline allows for early
fusion of modalities using a cross-attention module and employs an additional
auxiliary task during training. Our experiments demonstrate that this new
baseline outperforms the current state-of-the-art methods on established
benchmarks like FashionIQ and CIRR.
Related papers
- Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity [2.724141845301679]
Composed image retrieval (CIR) formulates the query as a combination of a reference image and modified text.
We introduce a training-free approach for ZS-CIR.
Our approach is simple, easy to implement, and its effectiveness is validated through experiments on the FashionIQ and CIRR datasets.
arXiv Detail & Related papers (2024-09-07T21:52:58Z) - Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - CoVR-2: Automatic Data Construction for Composed Video Retrieval [59.854331104466254]
Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers both text and image queries together.
We propose a scalable automatic dataset creation methodology that generates triplets given video-caption pairs.
We also expand the scope of the task to include composed video retrieval (CoVR)
arXiv Detail & Related papers (2023-08-28T17:55:33Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Zero-Shot Composed Image Retrieval with Textual Inversion [28.513594970580396]
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption.
We propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset.
arXiv Detail & Related papers (2023-03-27T14:31:25Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - The Little W-Net That Could: State-of-the-Art Retinal Vessel
Segmentation with Minimalistic Models [19.089445797922316]
We show that a minimalistic version of a standard U-Net with several orders of magnitude less parameters closely approximates the performance of current best techniques.
We also propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets.
We also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art.
arXiv Detail & Related papers (2020-09-03T19:59:51Z) - On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews,
Guidances and Million-AID [57.71601467271486]
This article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation.
We first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations.
Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset.
arXiv Detail & Related papers (2020-06-22T17:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.