Related papers: MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking

MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking

URL: http://arxiv.org/abs/2602.16299v1
Date: Wed, 18 Feb 2026 09:30:29 GMT
Title: MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking
Authors: Mathias Vast, Victor Morand, Basile van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski,
Abstract summary: Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost.<n>We show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions.<n>MICE decreases fourfold the inference latency compared to standard cross-encoders.
Score: 12.107932271370563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost. This prevents them from being used as first-stage rankers, but also incurs a cost when re-ranking documents. Prior work has addressed this bottleneck from two largely separate directions: accelerating cross-encoder inference by sparsifying the attention process or improving first-stage retrieval effectiveness using more complex models, e.g. late-interaction ones. In this work, we propose to bridge these two approaches, based on an in-depth understanding of the internal mechanisms of cross-encoders. Starting from cross-encoders, we show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions. We name this architecture MICE (Minimal Interaction Cross-Encoders). We extensively evaluate MICE across both in-domain (ID) and out-of-domain (OOD) datasets. MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.

Related papers

DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection [39.56089737473775]
We propose DS-Det, a more efficient transformer detector capable of detecting a flexible number of objects in images.<n>Specifically, we reformulate and introduce a new unified Single-Query paradigm for decoder modeling.<n>We also propose a simplified decoder framework through attention disentangled learning.
arXiv Detail & Related papers (2025-07-26T05:40:04Z)
Reverse-Engineering the Retrieval Process in GenIR Models [41.661577386460436]
Generative Information Retrieval (GenIR) is a novel paradigm in which a transformer encoder-decoder model predicts document rankings based on a query.<n>This work studies the internal retrieval process of GenIR models by applying methods based on mechanistic interpretability.
arXiv Detail & Related papers (2025-03-25T14:41:17Z)
CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks [12.045202648316678]
Transformer-based ranking models are the state-of-the-art approaches for such tasks. We propose Cross-encoders with Joint Efficient Modeling (CROSS-JEM) CROSS-JEM enables transformer-based models to jointly score multiple items for a query. It achieves state-of-the-art accuracy and over 4x lower ranking latency over standard cross-encoders.
arXiv Detail & Related papers (2024-09-15T17:05:35Z)
Triple-Encoders: Representations That Fire Together, Wire Together [51.15206713482718]
Contrastive Learning is a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder. This study introduces triple-encoders, which efficiently compute distributed utterance mixtures from these independently encoded utterances. We find that triple-encoders lead to a substantial improvement over bi-encoders, and even to better zero-shot generalization than single-vector representation models.
arXiv Detail & Related papers (2024-02-19T18:06:02Z)
Rethinking Patch Dependence for Masked Autoencoders [89.02576415930963]
We study the impact of inter-patch dependencies in the decoder of masked autoencoders (MAE) on representation learning.<n>We propose a simple visual pretraining framework: cross-attention masked autoencoders (CrossMAE)
arXiv Detail & Related papers (2024-01-25T18:49:57Z)
Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder. Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z)
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance. Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z)
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks [59.13635174016506]
We present a simple yet efficient data augmentation strategy called Augmented SBERT. We use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
arXiv Detail & Related papers (2020-10-16T08:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.