Related papers: A Study on Token Pruning for ColBERT

A Study on Token Pruning for ColBERT

URL: http://arxiv.org/abs/2112.06540v1
Date: Mon, 13 Dec 2021 10:24:54 GMT
Title: A Study on Token Pruning for ColBERT
Authors: Carlos Lassance, Maroua Maachou, Joohee Park, St\'ephane Clinchant
Abstract summary: The ColBERT model has recently been proposed as an effective BERT based ranker. The big downside of the model is the index size, which scales linearly with the number of tokens in the collection. In this paper, we study various designs for ColBERT models in order to attack this problem.
Score: 0.7646713951724011
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The ColBERT model has recently been proposed as an effective BERT based ranker. By adopting a late interaction mechanism, a major advantage of ColBERT is that document representations can be precomputed in advance. However, the big downside of the model is the index size, which scales linearly with the number of tokens in the collection. In this paper, we study various designs for ColBERT models in order to attack this problem. While compression techniques have been explored to reduce the index size, in this paper we study token pruning techniques for ColBERT. We compare simple heuristics, as well as a single layer of attention mechanism to select the tokens to keep at indexing time. Our experiments show that ColBERT indexes can be pruned up to 30\% on the MS MARCO passage collection without a significant drop in performance. Finally, we experiment on MS MARCO documents, which reveal several challenges for such mechanism.

Related papers

Towards Lossless Token Pruning in Late-Interaction Retrieval Models [10.983837305643723]
Late interaction neural IR models like ColBERT offer a competitive effectiveness-efficiency trade-off across many benchmarks. They require a huge memory space to store the contextual representation for all the document tokens. We propose a principled approach to define how to prune tokens without impacting the score between a document and a query.
arXiv Detail & Related papers (2025-04-17T09:18:58Z)
ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length [3.192109204993465]
We show that [MASK] tokens weighting non-[MASK] query terms emphasize certain tokens over others. We then examine the effect of changing the number of [MASK] tokens from zero to up to four times past the query input length used in training.
arXiv Detail & Related papers (2024-08-24T21:22:15Z)
Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z)
Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks. BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z)
Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction [10.749746283569847]
ColBERTer is a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. For its multi-vector component, ColBERTer reduces the number of stored per document by learning unique whole-word representations for the terms in each document. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness.
arXiv Detail & Related papers (2022-03-24T14:28:07Z)
One-shot Key Information Extraction from Document with Deep Partial Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios. Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research. Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains. In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
Distilling Dense Representations for Ranking using Tightly-Coupled Teachers [52.85472936277762]
We apply knowledge distillation to improve the recently proposed late-interaction ColBERT model. We distill the knowledge from ColBERT's expressive MaxSim operator for computing relevance scores into a simple dot product. We empirically show that our approach improves query latency and greatly reduces the onerous storage requirements of ColBERT.
arXiv Detail & Related papers (2020-10-22T02:26:01Z)
Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage [65.7062363323781]
We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining) We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
arXiv Detail & Related papers (2020-06-22T03:39:00Z)
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT [24.288824715337483]
ColBERT is a novel ranking model that adapts deep LMs for efficient retrieval. We extensively evaluate ColBERT using two recent passage search datasets.
arXiv Detail & Related papers (2020-04-27T14:21:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.