ColBERT: Efficient and Effective Passage Search via Contextualized Late
Interaction over BERT
- URL: http://arxiv.org/abs/2004.12832v2
- Date: Thu, 4 Jun 2020 05:28:21 GMT
- Title: ColBERT: Efficient and Effective Passage Search via Contextualized Late
Interaction over BERT
- Authors: Omar Khattab and Matei Zaharia
- Abstract summary: ColBERT is a novel ranking model that adapts deep LMs for efficient retrieval.
We extensively evaluate ColBERT using two recent passage search datasets.
- Score: 24.288824715337483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in Natural Language Understanding (NLU) is driving fast-paced
advances in Information Retrieval (IR), largely owed to fine-tuning deep
language models (LMs) for document ranking. While remarkably effective, the
ranking models based on these LMs increase computational cost by orders of
magnitude over prior approaches, particularly as they must feed each
query-document pair through a massive neural network to compute a single
relevance score. To tackle this, we present ColBERT, a novel ranking model that
adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT
introduces a late interaction architecture that independently encodes the query
and the document using BERT and then employs a cheap yet powerful interaction
step that models their fine-grained similarity. By delaying and yet retaining
this fine-granular interaction, ColBERT can leverage the expressiveness of deep
LMs while simultaneously gaining the ability to pre-compute document
representations offline, considerably speeding up query processing. Beyond
reducing the cost of re-ranking the documents retrieved by a traditional model,
ColBERT's pruning-friendly interaction mechanism enables leveraging
vector-similarity indexes for end-to-end retrieval directly from a large
document collection. We extensively evaluate ColBERT using two recent passage
search datasets. Results show that ColBERT's effectiveness is competitive with
existing BERT-based models (and outperforms every non-BERT baseline), while
executing two orders-of-magnitude faster and requiring four orders-of-magnitude
fewer FLOPs per query.
Related papers
- DocMamba: Efficient Document Pre-training with State Space Model [56.84200017560988]
We present DocMamba, a novel framework based on the state space model.
It is designed to reduce computational complexity to linear while preserving global modeling capabilities.
Experiments on the HRDoc confirm DocMamba's potential for length extrapolation.
arXiv Detail & Related papers (2024-09-18T11:34:28Z) - Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever [6.221757399678299]
ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders.
Our new model, Jina-ColBERT-v2, demonstrates strong performance across a range of English and multilingual retrieval tasks.
arXiv Detail & Related papers (2024-08-29T16:21:00Z) - Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - SPLATE: Sparse Late Interaction Retrieval [13.607085390630647]
SPLATE is a lightweight adaptation of the ColBERTv2 model which learns an MLM adapter''
Our pipeline achieves the same effectiveness as the PLAID ColBERTv2 engine by re-ranking 50 documents that can be retrieved under 10ms.
arXiv Detail & Related papers (2024-04-22T07:51:13Z) - Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized
Late Interactions using Enhanced Reduction [10.749746283569847]
ColBERTer is a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction.
For its multi-vector component, ColBERTer reduces the number of stored per document by learning unique whole-word representations for the terms in each document.
Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness.
arXiv Detail & Related papers (2022-03-24T14:28:07Z) - Hierarchical Neural Network Approaches for Long Document Classification [3.6700088931938835]
We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently.
Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE.
We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
arXiv Detail & Related papers (2022-01-18T07:17:40Z) - TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference [54.791572981834435]
Existing pre-trained language models (PLMs) are often computationally expensive in inference.
We propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT.
TR-BERT formulates the token reduction process as a multi-step token selection problem and automatically learns the selection strategy via reinforcement learning.
arXiv Detail & Related papers (2021-05-25T02:28:51Z) - A Study on Efficiency, Accuracy and Document Structure for Answer
Sentence Selection [112.0514737686492]
In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we can achieve competitive results.
Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the $sim 18$ minutes required by a standard BERT-base fine-tuning.
arXiv Detail & Related papers (2020-03-04T22:12:18Z) - DC-BERT: Decoupling Question and Document for Efficient Contextual
Encoding [90.85913515409275]
Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT.
We propose DC-BERT, a contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings.
On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance.
arXiv Detail & Related papers (2020-02-28T08:18:37Z) - TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
Efficient Retrieval [11.923682816611716]
We present TwinBERT model for effective and efficient retrieval.
It has twin-structured BERT-like encoders to represent query and document respectively.
It allows document embeddings to be pre-computed offline and cached in memory.
arXiv Detail & Related papers (2020-02-14T22:44:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.