Ranking-aware Uncertainty for Text-guided Image Retrieval
- URL: http://arxiv.org/abs/2308.08131v1
- Date: Wed, 16 Aug 2023 03:48:19 GMT
- Title: Ranking-aware Uncertainty for Text-guided Image Retrieval
- Authors: Junyang Chen and Hanjiang Lai
- Abstract summary: We propose a novel ranking-aware uncertainty approach to model many-to-many correspondences.
Compared to the existing state-of-the-art methods, our proposed method achieves significant results on two public datasets.
- Score: 17.70430913227593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided image retrieval is to incorporate conditional text to better
capture users' intent. Traditionally, the existing methods focus on minimizing
the embedding distances between the source inputs and the targeted image, using
the provided triplets $\langle$source image, source text, target
image$\rangle$. However, such triplet optimization may limit the learned
retrieval model to capture more detailed ranking information, e.g., the
triplets are one-to-one correspondences and they fail to account for
many-to-many correspondences arising from semantic diversity in feedback
languages and images. To capture more ranking information, we propose a novel
ranking-aware uncertainty approach to model many-to-many correspondences by
only using the provided triplets. We introduce uncertainty learning to learn
the stochastic ranking list of features. Specifically, our approach mainly
comprises three components: (1) In-sample uncertainty, which aims to capture
semantic diversity using a Gaussian distribution derived from both combined and
target features; (2) Cross-sample uncertainty, which further mines the ranking
information from other samples' distributions; and (3) Distribution
regularization, which aligns the distributional representations of source
inputs and targeted image. Compared to the existing state-of-the-art methods,
our proposed method achieves significant results on two public datasets for
composed image retrieval.
Related papers
- Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking [34.31345844296072]
Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text.
Most current composed image retrieval methods follow a supervised learning approach to training on a costly triplet dataset composed of a reference image, modified text, and a corresponding target image.
We present a new training-free zero-shot composed image retrieval method which translates the query into explicit human-understandable text.
arXiv Detail & Related papers (2023-12-14T13:31:01Z) - Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z) - Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations [67.92679668612858]
We propose the Consensus Network (Css-Net), inspired by the psychological concept that groups outperform individuals.
Css-Net comprises two core components: (1) a consensus module with four diverse compositors, each generating distinct image-text embeddings; and (2) a Kullback-Leibler divergence loss that encourages learning of inter-compositor interactions.
On benchmark datasets, particularly FashionIQ, Css-Net demonstrates marked improvements. Notably, it achieves significant recall gains, with a 2.77% increase in R@10 and 6.67% boost in R@50, underscoring its
arXiv Detail & Related papers (2023-06-03T11:50:44Z) - Conditional Score Guidance for Text-Driven Image-to-Image Translation [52.73564644268749]
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model.
Our method aims to generate a target image by selectively editing the regions of interest in a source image.
arXiv Detail & Related papers (2023-05-29T10:48:34Z) - Probabilistic Warp Consistency for Weakly-Supervised Semantic
Correspondences [118.6018141306409]
We propose Probabilistic Warp Consistency, a weakly-supervised learning objective for semantic matching.
We first construct an image triplet by applying a known warp to one of the images in a pair depicting different instances of the same object class.
Our objective also brings substantial improvements in the strongly-supervised regime, when combined with keypoint annotations.
arXiv Detail & Related papers (2022-03-08T18:55:11Z) - A Novel Triplet Sampling Method for Multi-Label Remote Sensing Image
Search and Retrieval [1.123376893295777]
A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar (negative) images.
We propose a novel triplet sampling method in the framework of deep neural networks (DNNs) defined for multi-label RS CBIR problems.
arXiv Detail & Related papers (2021-05-08T09:16:09Z) - Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image.
Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z) - Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search [90.30623718137244]
We propose a novel deep hashing method for scalable multi-label image search.
A new rank-consistency objective is applied to align the similarity orders from two spaces.
A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched.
arXiv Detail & Related papers (2021-02-02T13:46:58Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.