Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
- URL: http://arxiv.org/abs/2402.17535v1
- Date: Tue, 27 Feb 2024 14:21:56 GMT
- Title: Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
- Authors: Thong Nguyen, Mariya Hendriksen, Andrew Yates, Maarten de Rijke
- Abstract summary: Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
- Score: 66.78146440275093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learned sparse retrieval (LSR) is a family of neural methods that encode
queries and documents into sparse lexical vectors that can be indexed and
retrieved efficiently with an inverted index. We explore the application of LSR
to the multi-modal domain, with a focus on text-image retrieval. While LSR has
seen success in text retrieval, its application in multimodal retrieval remains
underexplored. Current approaches like LexLIP and STAIR require complex
multi-step training on massive datasets. Our proposed approach efficiently
transforms dense vectors from a frozen dense model into sparse lexical vectors.
We address issues of high dimension co-activation and semantic deviation
through a new training algorithm, using Bernoulli random variables to control
query expansion. Experiments with two dense models (BLIP, ALBEF) and two
datasets (MSCOCO, Flickr30k) show that our proposed algorithm effectively
reduces co-activation and semantic deviation. Our best-performing sparsified
model outperforms state-of-the-art text-image LSR models with a shorter
training time and lower GPU memory requirements. Our approach offers an
effective solution for training LSR retrieval models in multimodal settings.
Our code and model checkpoints are available at
github.com/thongnt99/lsr-multimodal
Related papers
- Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities.
We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Abstractive Summarization as Augmentation for Document-Level Event
Detection [0.0]
We bridge the performance gap between shallow and deep models on document-level event detection by using abstractive text summarization as an augmentation method.
We use four decoding methods for text generation, namely beam search, top-k sampling, top-p sampling, and contrastive search.
Our results show that using the document title offers 2.04% and 3.19% absolute improvement in macro F1-score for linear SVM and RoBERTa, respectively.
arXiv Detail & Related papers (2023-05-29T11:28:26Z) - Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution.
We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions.
We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z) - CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers.
These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.
We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Unsupervised Multimodal Language Representations using Convolutional
Autoencoders [5.464072883537924]
We propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks.
We map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets.
It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters.
arXiv Detail & Related papers (2021-10-06T18:28:07Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - Lightweight Image Super-Resolution with Hierarchical and Differentiable
Neural Architecture Search [38.83764580480486]
Single Image Super-Resolution (SISR) tasks have achieved significant performance with deep neural networks.
We propose a novel differentiable Neural Architecture Search (NAS) approach on both the cell-level and network-level to search for lightweight SISR models.
arXiv Detail & Related papers (2021-05-09T13:30:16Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.