ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot
Retrieval of Images from Textual Descriptions
- URL: http://arxiv.org/abs/2007.12212v3
- Date: Wed, 23 Sep 2020 11:41:12 GMT
- Title: ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot
Retrieval of Images from Textual Descriptions
- Authors: Anurag Roy, Vinay Kumar Verma, Kripabandhu Ghosh, Saptarshi Ghosh
- Abstract summary: We propose a novel GAN-based model for zero-shot text to image retrieval.
The proposed model is trained using an Expectation-Maximization framework.
Experiments on multiple benchmark datasets show that our proposed model comfortably outperforms several state-of-the-art zero-shot text to image retrieval models.
- Score: 13.15755441853131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing algorithms for cross-modal Information Retrieval are based on a
supervised train-test setup, where a model learns to align the mode of the
query (e.g., text) to the mode of the documents (e.g., images) from a given
training set. Such a setup assumes that the training set contains an exhaustive
representation of all possible classes of queries. In reality, a retrieval
model may need to be deployed on previously unseen classes, which implies a
zero-shot IR setup. In this paper, we propose a novel GAN-based model for
zero-shot text to image retrieval. When given a textual description as the
query, our model can retrieve relevant images in a zero-shot setup. The
proposed model is trained using an Expectation-Maximization framework.
Experiments on multiple benchmark datasets show that our proposed model
comfortably outperforms several state-of-the-art zero-shot text to image
retrieval models, as well as zero-shot classification and hashing models
suitably used for retrieval.
Related papers
- Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity [2.724141845301679]
Composed image retrieval (CIR) formulates the query as a combination of a reference image and modified text.
We introduce a training-free approach for ZS-CIR.
Our approach is simple, easy to implement, and its effectiveness is validated through experiments on the FashionIQ and CIRR datasets.
arXiv Detail & Related papers (2024-09-07T21:52:58Z) - Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Self-Enhancement Improves Text-Image Retrieval in Foundation
Visual-Language Models [33.008325765051865]
Cross-modal foundation models fail to focus on the key attributes required for domain-specific retrieval tasks.
We propose a self-enhancement framework, A3R, based on the CLIP-ViT/G-14, one of the largest cross-modal models.
arXiv Detail & Related papers (2023-06-11T14:25:38Z) - I2MVFormer: Large Language Model Generated Multi-View Document
Supervision for Zero-Shot Image Classification [108.83932812826521]
Large Language Models (LLM) trained on web-scale text show impressive abilities to repurpose their learned knowledge for a multitude of tasks.
Our proposed model, I2MVFormer, learns multi-view semantic embeddings for zero-shot image classification with these class views.
I2MVFormer establishes a new state-of-the-art on three public benchmark datasets for zero-shot image classification with unsupervised semantic embeddings.
arXiv Detail & Related papers (2022-12-05T14:11:36Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Content-Based Search for Deep Generative Models [45.322081206025544]
We introduce the task of content-based model search: given a query and a large set of generative models, finding the models that best match the query.
As each generative model produces a distribution of images, we formulate the search task as an optimization problem to select the model with the highest probability of generating similar content as the query.
We demonstrate that our method outperforms several baselines on Generative Model Zoo, a new benchmark we create for the model retrieval task.
arXiv Detail & Related papers (2022-10-06T17:59:51Z) - Evaluating Contrastive Models for Instance-based Image Retrieval [6.393147386784114]
We evaluate contrastive models for the task of image retrieval.
We find that models trained using contrastive methods perform on-par with (and outperforms) a pre-trained baseline trained on the ImageNet labels.
arXiv Detail & Related papers (2021-04-30T12:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.