Related papers: ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length

ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length

URL: http://arxiv.org/abs/2408.13672v1
Date: Sat, 24 Aug 2024 21:22:15 GMT
Title: ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length
Authors: Ben Giacalone, Richard Zanibbi,
Abstract summary: We show that [MASK] tokens weighting non-[MASK] query terms emphasize certain tokens over others. We then examine the effect of changing the number of [MASK] tokens from zero to up to four times past the query input length used in training.
Score: 3.192109204993465
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A unique aspect of ColBERT is its use of [MASK] tokens in queries to score documents (query augmentation). Prior work shows [MASK] tokens weighting non-[MASK] query terms, emphasizing certain tokens over others , rather than introducing whole new terms as initially proposed. We begin by demonstrating that a term weighting behavior previously reported for [MASK] tokens in ColBERTv1 holds for ColBERTv2. We then examine the effect of changing the number of [MASK] tokens from zero to up to four times past the query input length used in training, both for first stage retrieval, and for scoring candidates, observing an initial decrease in performance with few [MASK]s, a large increase when enough [MASK]s are added to pad queries to an average length of 32, then a plateau in performance afterwards. Additionally, we compare baseline performance to performance when the query length is extended to 128 tokens, and find that differences are small (e.g., within 1% on various metrics) and generally statistically insignificant, indicating performance does not collapse if ColBERT is presented with more [MASK] tokens than expected.

Related papers

Towards Lossless Token Pruning in Late-Interaction Retrieval Models [10.983837305643723]
Late interaction neural IR models like ColBERT offer a competitive effectiveness-efficiency trade-off across many benchmarks. They require a huge memory space to store the contextual representation for all the document tokens. We propose a principled approach to define how to prune tokens without impacting the score between a document and a query.
arXiv Detail & Related papers (2025-04-17T09:18:58Z)
Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models [50.214593234229255]
We introduce the novel task of extreme short token reduction, aiming to represent extensive video sequences with a minimal number of tokens. We propose Token Dynamics, a new video representation framework that dynamically reduces token count while preserving spatial-temporal coherence. Experiments demonstrate a reduction of token count to merely 0.07% of the original tokens, with only a minor performance drop of 1.13%.
arXiv Detail & Related papers (2025-03-21T09:46:31Z)
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator [65.62084602011596]
Large Language Models (LLMs) have exhibited exceptional performance across a spectrum of natural language processing tasks. We have identified a key pattern: certain seemingly meaningless separator tokens (i.e., punctuations) contribute disproportionately to attention scores compared to semantically meaningful tokens. We introduce SepLLM, a plug-and-play framework that accelerates inference by compressing these segments and eliminating redundant tokens.
arXiv Detail & Related papers (2024-12-16T18:58:57Z)
Tokenization Is More Than Compression [14.939912120571728]
Existing tokenization approaches like Byte-Pair. (BPE) originate from the field of data compression. We introduce PathPiece, a new tokenizer that segments a document's text into the minimum number of tokens for a given vocabulary.
arXiv Detail & Related papers (2024-02-28T14:52:15Z)
Tree Cross Attention [59.8891512435847]
Tree Cross Attention (TCA) is a module based on Cross Attention that only retrieves information from a logarithmic $mathcalO(log(N))$ number of tokens for performing inference. We show that TCA performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient.
arXiv Detail & Related papers (2023-09-29T16:50:23Z)
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens [65.4435926060951]
We propose to significantly improve the efficiency of Transformers for ultra long sequences, by compressing the sequence into a much smaller representation at each layer. Our algorithm is not only efficient (achieving more than $3times$ efficiency gain compared to baselines on 4K and 16K lengths) but also offers competitive/better performance on a large number of tasks.
arXiv Detail & Related papers (2023-05-07T10:32:18Z)
Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document. Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy. We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z)
Dynamic Focus-aware Positional Queries for Semantic Segmentation [94.6834904076914]
We propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries. Our framework achieves SOTA performance and outperforms Mask2former by clear margins of 1.1%, 1.9%, and 1.1% single-scale mIoU with ResNet-50, Swin-T, and Swin-B backbones.
arXiv Detail & Related papers (2022-04-04T05:16:41Z)
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive. We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z)
Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction [10.749746283569847]
ColBERTer is a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. For its multi-vector component, ColBERTer reduces the number of stored per document by learning unique whole-word representations for the terms in each document. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness.
arXiv Detail & Related papers (2022-03-24T14:28:07Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
Protum: A New Method For Prompt Tuning Based on "[MASK]" [12.057434751507552]
We propose a new textbfPrompt textbfTuning based on "[textbfMASK]" (textbfProtum) method in this paper. Our textbfProtum can achieve much better performance than fine-tuning after continuous pre-training with less time consumption.
arXiv Detail & Related papers (2022-01-28T13:34:30Z)
A Study on Token Pruning for ColBERT [0.7646713951724011]
The ColBERT model has recently been proposed as an effective BERT based ranker. The big downside of the model is the index size, which scales linearly with the number of tokens in the collection. In this paper, we study various designs for ColBERT models in order to attack this problem.
arXiv Detail & Related papers (2021-12-13T10:24:54Z)
Temporal Query Networks for Fine-grained Video Understanding [88.9877174286279]
We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set. We evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features.
arXiv Detail & Related papers (2021-04-19T17:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.