MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking
- URL: http://arxiv.org/abs/2602.16299v1
- Date: Wed, 18 Feb 2026 09:30:29 GMT
- Title: MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking
- Authors: Mathias Vast, Victor Morand, Basile van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski,
- Abstract summary: Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost.<n>We show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions.<n>MICE decreases fourfold the inference latency compared to standard cross-encoders.
- Score: 12.107932271370563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost. This prevents them from being used as first-stage rankers, but also incurs a cost when re-ranking documents. Prior work has addressed this bottleneck from two largely separate directions: accelerating cross-encoder inference by sparsifying the attention process or improving first-stage retrieval effectiveness using more complex models, e.g. late-interaction ones. In this work, we propose to bridge these two approaches, based on an in-depth understanding of the internal mechanisms of cross-encoders. Starting from cross-encoders, we show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions. We name this architecture MICE (Minimal Interaction Cross-Encoders). We extensively evaluate MICE across both in-domain (ID) and out-of-domain (OOD) datasets. MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.
Related papers
- DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection [39.56089737473775]
We propose DS-Det, a more efficient transformer detector capable of detecting a flexible number of objects in images.<n>Specifically, we reformulate and introduce a new unified Single-Query paradigm for decoder modeling.<n>We also propose a simplified decoder framework through attention disentangled learning.
arXiv Detail & Related papers (2025-07-26T05:40:04Z) - Reverse-Engineering the Retrieval Process in GenIR Models [41.661577386460436]
Generative Information Retrieval (GenIR) is a novel paradigm in which a transformer encoder-decoder model predicts document rankings based on a query.<n>This work studies the internal retrieval process of GenIR models by applying methods based on mechanistic interpretability.
arXiv Detail & Related papers (2025-03-25T14:41:17Z) - CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks [12.045202648316678]
Transformer-based ranking models are the state-of-the-art approaches for such tasks.
We propose Cross-encoders with Joint Efficient Modeling (CROSS-JEM)
CROSS-JEM enables transformer-based models to jointly score multiple items for a query.
It achieves state-of-the-art accuracy and over 4x lower ranking latency over standard cross-encoders.
arXiv Detail & Related papers (2024-09-15T17:05:35Z) - Triple-Encoders: Representations That Fire Together, Wire Together [51.15206713482718]
Contrastive Learning is a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder.
This study introduces triple-encoders, which efficiently compute distributed utterance mixtures from these independently encoded utterances.
We find that triple-encoders lead to a substantial improvement over bi-encoders, and even to better zero-shot generalization than single-vector representation models.
arXiv Detail & Related papers (2024-02-19T18:06:02Z) - Rethinking Patch Dependence for Masked Autoencoders [89.02576415930963]
We study the impact of inter-patch dependencies in the decoder of masked autoencoders (MAE) on representation learning.<n>We propose a simple visual pretraining framework: cross-attention masked autoencoders (CrossMAE)
arXiv Detail & Related papers (2024-01-25T18:49:57Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - Trans-Encoder: Unsupervised sentence-pair modelling through self- and
mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient.
Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance.
Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z) - Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for
Pairwise Sentence Scoring Tasks [59.13635174016506]
We present a simple yet efficient data augmentation strategy called Augmented SBERT.
We use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder.
We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
arXiv Detail & Related papers (2020-10-16T08:43:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.