MosAIc: Finding Artistic Connections across Culture with Conditional
Image Retrieval
- URL: http://arxiv.org/abs/2007.07177v3
- Date: Sun, 28 Feb 2021 01:08:22 GMT
- Title: MosAIc: Finding Artistic Connections across Culture with Conditional
Image Retrieval
- Authors: Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp,
Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris
Hoder, William T. Freeman
- Abstract summary: We introduce Conditional Image Retrieval (CIR) which combines visual similarity search with user supplied filters or "conditions"
CIR allows one to find pairs of similar images that span distinct subsets of the image corpus.
We show that our CIR data-structures can identify "blind spots" in Generative Adversarial Networks (GAN) where they fail to properly model the true data distribution.
- Score: 27.549695661396274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MosAIc, an interactive web app that allows users to find pairs
of semantically related artworks that span different cultures, media, and
millennia. To create this application, we introduce Conditional Image Retrieval
(CIR) which combines visual similarity search with user supplied filters or
"conditions". This technique allows one to find pairs of similar images that
span distinct subsets of the image corpus. We provide a generic way to adapt
existing image retrieval data-structures to this new domain and provide
theoretical bounds on our approach's efficiency. To quantify the performance of
CIR systems, we introduce new datasets for evaluating CIR methods and show that
CIR performs non-parametric style transfer. Finally, we demonstrate that our
CIR data-structures can identify "blind spots" in Generative Adversarial
Networks (GAN) where they fail to properly model the true data distribution.
Related papers
- Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity [2.724141845301679]
Composed image retrieval (CIR) formulates the query as a combination of a reference image and modified text.
We introduce a training-free approach for ZS-CIR.
Our approach is simple, easy to implement, and its effectiveness is validated through experiments on the FashionIQ and CIRR datasets.
arXiv Detail & Related papers (2024-09-07T21:52:58Z) - iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval [26.101116761577796]
Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption.
We introduce a new task, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeled training dataset.
We present an open-domain benchmarking dataset named CIRCO, where each query is labeled with multiple ground truths and a semantic categorization.
arXiv Detail & Related papers (2024-05-05T14:39:06Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval [0.46040036610482665]
We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy.
The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification.
Our method achieves superior performance compared to the existing methods on image retrieval.
arXiv Detail & Related papers (2023-08-16T15:23:14Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
Models [41.7254780975984]
We extend the task of composed image retrieval, where an input query consists of an image and short textual description of how to modify the image.
We propose CIRPLANT, a transformer based model that leverages rich pre-trained vision-and-language (V&L) knowledge for modifying visual features conditioned on natural language.
We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion.
arXiv Detail & Related papers (2021-08-09T13:25:06Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.