Related papers: MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval

MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval

URL: http://arxiv.org/abs/2007.07177v3
Date: Sun, 28 Feb 2021 01:08:22 GMT
Title: MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval
Authors: Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp, Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris Hoder, William T. Freeman
Abstract summary: We introduce Conditional Image Retrieval (CIR) which combines visual similarity search with user supplied filters or "conditions" CIR allows one to find pairs of similar images that span distinct subsets of the image corpus. We show that our CIR data-structures can identify "blind spots" in Generative Adversarial Networks (GAN) where they fail to properly model the true data distribution.
Score: 27.549695661396274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce MosAIc, an interactive web app that allows users to find pairs of semantically related artworks that span different cultures, media, and millennia. To create this application, we introduce Conditional Image Retrieval (CIR) which combines visual similarity search with user supplied filters or "conditions". This technique allows one to find pairs of similar images that span distinct subsets of the image corpus. We provide a generic way to adapt existing image retrieval data-structures to this new domain and provide theoretical bounds on our approach's efficiency. To quantify the performance of CIR systems, we introduce new datasets for evaluating CIR methods and show that CIR performs non-parametric style transfer. Finally, we demonstrate that our CIR data-structures can identify "blind spots" in Generative Adversarial Networks (GAN) where they fail to properly model the true data distribution.

Related papers

ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval [10.156187875858995]
Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text.<n>We introduce a framework, ConText-CIR, trained with a Text Concept-Consistency loss.<n>We show that these components together enable stronger performance on CIR tasks.
arXiv Detail & Related papers (2025-05-27T06:09:57Z)
good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval [10.156187875858995]
Composed image retrieval (CIR) enables users to search images using a reference image combined with textual modifications. We introduce good4cir, a structured pipeline leveraging vision-language models to generate high-quality synthetic annotations. Results demonstrate improved retrieval accuracy for CIR models trained on our pipeline-generated datasets.
arXiv Detail & Related papers (2025-03-22T22:33:56Z)
A Comprehensive Survey on Composed Image Retrieval [54.54527281731775]
Composed Image Retrieval (CIR) is an emerging yet challenging task that allows users to search for target images using a multimodal query. There is currently no comprehensive review of CIR to provide a timely overview of this field. We synthesize insights from over 120 publications in top conferences and journals, including ACM TOIS, SIGIR, and CVPR.
arXiv Detail & Related papers (2025-02-19T01:37:24Z)
Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity [2.724141845301679]
Composed image retrieval (CIR) formulates the query as a combination of a reference image and modified text. We introduce a training-free approach for ZS-CIR. Our approach is simple, easy to implement, and its effectiveness is validated through experiments on the FashionIQ and CIRR datasets.
arXiv Detail & Related papers (2024-09-07T21:52:58Z)
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval [26.101116761577796]
Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. We introduce a new task, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeled training dataset. We present an open-domain benchmarking dataset named CIRCO, where each query is labeled with multiple ground truths and a semantic categorization.
arXiv Detail & Related papers (2024-05-05T14:39:06Z)
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z)
Images in Discrete Choice Modeling: Addressing Data Isomorphism in Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval [0.46040036610482665]
We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy. The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification. Our method achieves superior performance compared to the existing methods on image retrieval.
arXiv Detail & Related papers (2023-08-16T15:23:14Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models [41.7254780975984]
We extend the task of composed image retrieval, where an input query consists of an image and short textual description of how to modify the image. We propose CIRPLANT, a transformer based model that leverages rich pre-trained vision-and-language (V&L) knowledge for modifying visual features conditioned on natural language. We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion.
arXiv Detail & Related papers (2021-08-09T13:25:06Z)
MOGAN: Morphologic-structure-aware Generative Learning from a Single Image [59.59698650663925]
Recently proposed generative models complete training based on only one image. We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances. Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.