Related papers: Incorporating Token Importance in Multi-Vector Retrieval

Incorporating Token Importance in Multi-Vector Retrieval

URL: http://arxiv.org/abs/2511.16106v1
Date: Thu, 20 Nov 2025 06:58:31 GMT
Title: Incorporating Token Importance in Multi-Vector Retrieval
Authors: Archish S, Ankit Garg, Kirankumar Shiragur, Neeraj Kayal,
Abstract summary: ColBERT encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations.<n>We introduce enhancements to the Chamfer distance function by computing a weighted sum over query token contributions.<n>Our method achieves an average improvement of 1.28% in Recall@10 in the zero-shot setting using IDF-based weights, and 3.66% through few-shot fine-tuning.
Score: 12.87368993054882
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: ColBERT introduced a late interaction mechanism that independently encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations. This design enables expressive matching while allowing efficient computation of scores, as the multi-vector document representations could be pre-computed offline. ColBERT models distance using a Chamfer-style function: for each query token, it selects the closest document token and sums these distances across all query tokens. In our work, we explore enhancements to the Chamfer distance function by computing a weighted sum over query token contributions, where weights reflect the token importance. Empirically, we show that this simple extension, requiring only token-weight training while keeping the multi-vector representations fixed, further enhances the expressiveness of late interaction multi-vector mechanism. In particular, on the BEIR benchmark, our method achieves an average improvement of 1.28\% in Recall@10 in the zero-shot setting using IDF-based weights, and 3.66\% through few-shot fine-tuning.

Related papers

Multi-Vector Index Compression in Any Modality [73.7330345057813]
Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos.<n>We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC)<n>AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation.
arXiv Detail & Related papers (2026-02-24T18:57:33Z)
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs [59.415473779171315]
We propose a novel visual token pruning strategy called textbfSaliency-textbfCoverage textbfOriented token textbfPruning for textbfEfficient MLLMs.
arXiv Detail & Related papers (2025-10-28T09:29:37Z)
QuickMerge++: Fast Token Merging with Autoregressive Prior [6.185573921868495]
We propose QuickMerge, a lightweight framework for efficient next-token prediction.<n>By combining semantic salience estimation, flexible token budgets, and AR alignment, QuickMerge enables accurate generation with fewer tokens.<n>We evaluate QuickMerge across multi-modality domains, demonstrating consistent improvements in compute-accuracy tradeoffs.
arXiv Detail & Related papers (2025-08-16T06:07:33Z)
Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z)
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval [22.508682857329912]
Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first.
arXiv Detail & Related papers (2023-04-04T17:37:06Z)
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers. These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z)
Multi-Vector Retrieval as Sparse Alignment [21.892007741798853]
We propose a novel multi-vector retrieval model that learns sparsified pairwise alignments between query and document tokens. We learn the sparse unary saliences with entropy-regularized linear programming, which outperforms other methods to achieve sparsity. Our model often produces interpretable alignments and significantly improves its performance when from larger language models.
arXiv Detail & Related papers (2022-11-02T16:49:58Z)
Disentangled Representation Learning for Text-Video Retrieval [51.861423831566626]
Cross-modality interaction is a critical component in Text-Video Retrieval (TVR) We study the interaction paradigm in depth, where we find that its computation can be split into two terms. We propose a disentangled framework to capture a sequential and hierarchical representation.
arXiv Detail & Related papers (2022-03-14T13:55:33Z)
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object. With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion. We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.