A Study on Token Pruning for ColBERT
- URL: http://arxiv.org/abs/2112.06540v1
- Date: Mon, 13 Dec 2021 10:24:54 GMT
- Title: A Study on Token Pruning for ColBERT
- Authors: Carlos Lassance, Maroua Maachou, Joohee Park, St\'ephane Clinchant
- Abstract summary: The ColBERT model has recently been proposed as an effective BERT based ranker.
The big downside of the model is the index size, which scales linearly with the number of tokens in the collection.
In this paper, we study various designs for ColBERT models in order to attack this problem.
- Score: 0.7646713951724011
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The ColBERT model has recently been proposed as an effective BERT based
ranker. By adopting a late interaction mechanism, a major advantage of ColBERT
is that document representations can be precomputed in advance. However, the
big downside of the model is the index size, which scales linearly with the
number of tokens in the collection. In this paper, we study various designs for
ColBERT models in order to attack this problem. While compression techniques
have been explored to reduce the index size, in this paper we study token
pruning techniques for ColBERT. We compare simple heuristics, as well as a
single layer of attention mechanism to select the tokens to keep at indexing
time. Our experiments show that ColBERT indexes can be pruned up to 30\% on the
MS MARCO passage collection without a significant drop in performance. Finally,
we experiment on MS MARCO documents, which reveal several challenges for such
mechanism.
Related papers
- ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length [3.192109204993465]
We show that [MASK] tokens weighting non-[MASK] query terms emphasize certain tokens over others.
We then examine the effect of changing the number of [MASK] tokens from zero to up to four times past the query input length used in training.
arXiv Detail & Related papers (2024-08-24T21:22:15Z) - Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - Breaking the Token Barrier: Chunking and Convolution for Efficient Long
Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks.
BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input.
We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z) - Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized
Late Interactions using Enhanced Reduction [10.749746283569847]
ColBERTer is a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction.
For its multi-vector component, ColBERTer reduces the number of stored per document by learning unique whole-word representations for the terms in each document.
Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness.
arXiv Detail & Related papers (2022-03-24T14:28:07Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - Distilling Dense Representations for Ranking using Tightly-Coupled
Teachers [52.85472936277762]
We apply knowledge distillation to improve the recently proposed late-interaction ColBERT model.
We distill the knowledge from ColBERT's expressive MaxSim operator for computing relevance scores into a simple dot product.
We empirically show that our approach improves query latency and greatly reduces the onerous storage requirements of ColBERT.
arXiv Detail & Related papers (2020-10-22T02:26:01Z) - Students Need More Attention: BERT-based AttentionModel for Small Data
with Application to AutomaticPatient Message Triage [65.7062363323781]
We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining)
We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets.
As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
arXiv Detail & Related papers (2020-06-22T03:39:00Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z) - ColBERT: Efficient and Effective Passage Search via Contextualized Late
Interaction over BERT [24.288824715337483]
ColBERT is a novel ranking model that adapts deep LMs for efficient retrieval.
We extensively evaluate ColBERT using two recent passage search datasets.
arXiv Detail & Related papers (2020-04-27T14:21:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.