Cross Modal Retrieval with Querybank Normalisation
- URL: http://arxiv.org/abs/2112.12777v1
- Date: Thu, 23 Dec 2021 18:51:58 GMT
- Title: Cross Modal Retrieval with Querybank Normalisation
- Authors: Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel
Albanie
- Abstract summary: We show that state-of-the-art joint embeddings suffer from the longstanding hubness problem.
We formulate a simple but effective framework that re-normalises query similarities to account for hubs in the embedding space.
We show that QB-Norm works effectively without concurrent access to any test set queries.
- Score: 41.877255953069074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Profiting from large-scale training datasets, advances in neural architecture
design and efficient inference, joint embeddings have become the dominant
approach for tackling cross-modal retrieval. In this work we first show that,
despite their effectiveness, state-of-the-art joint embeddings suffer
significantly from the longstanding hubness problem in which a small number of
gallery embeddings form the nearest neighbours of many queries. Drawing
inspiration from the NLP literature, we formulate a simple but effective
framework called Querybank Normalisation (QB-Norm) that re-normalises query
similarities to account for hubs in the embedding space. QB-Norm improves
retrieval performance without requiring retraining. Differently from prior
work, we show that QB-Norm works effectively without concurrent access to any
test set queries. Within the QB-Norm framework, we also propose a novel
similarity normalisation method, the Dynamic Inverted Softmax, that is
significantly more robust than existing approaches. We showcase QB-Norm across
a range of cross modal retrieval models and benchmarks where it consistently
enhances strong baselines beyond the state of the art. Code is available at
https://vladbogo.github.io/QB-Norm/.
Related papers
- Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and
Gallery Banks [5.164924773752648]
Hubness is a phenomenon where a small number of gallery data points are frequently retrieved, resulting in a decline in retrieval performance.
We show the necessity of incorporating both the gallery and query data for addressing hubness as hubs always exhibit high similarity with gallery and query data.
We present extensive experimental results on diverse language-grounded benchmarks, including text-image, text-video, and text-audio.
arXiv Detail & Related papers (2023-10-17T22:10:17Z) - Learnable Pillar-based Re-ranking for Image-Text Retrieval [119.9979224297237]
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.
Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks.
We propose a novel learnable pillar-based re-ranking paradigm for image-text retrieval.
arXiv Detail & Related papers (2023-04-25T04:33:27Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Learning Canonical Embedding for Non-rigid Shape Matching [36.85782408336389]
This paper provides a novel framework that learns canonical embeddings for non-rigid shape matching.
Our framework is trained end-to-end and thus avoids instabilities and constraints associated with the commonly-used Laplace-Beltrami basis.
arXiv Detail & Related papers (2021-10-06T18:09:13Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared
Person Re-Identification [102.89434996930387]
VI-ReID aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment.
Existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations.
We propose a novel method, named Cross-Modality Neural Architecture Search (CM-NAS)
arXiv Detail & Related papers (2021-01-21T07:07:00Z) - Multi-task Retrieval for Knowledge-Intensive Tasks [21.725935960568027]
We propose a multi-task trained model for neural retrieval.
Our approach not only outperforms previous methods in the few-shot setting, but also rivals specialised neural retrievers.
With the help of our retriever, we improve existing models for downstream tasks and closely match or improve the state of the art on multiple benchmarks.
arXiv Detail & Related papers (2021-01-01T00:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.