Investigating Multi-layer Representations for Dense Passage Retrieval
- URL: http://arxiv.org/abs/2509.23861v1
- Date: Sun, 28 Sep 2025 13:00:53 GMT
- Title: Investigating Multi-layer Representations for Dense Passage Retrieval
- Authors: Zhongbin Xie, Thomas Lukasiewicz,
- Abstract summary: We denote Multi-layer Representations (MLR) to make up the representation of a document.<n>We first investigate how representations in different layers affect MLR's performance under the multi-vector retrieval setting.<n>We propose to leverage pooling strategies to reduce multi-vector models to single-vector ones to improve retrieval efficiency.
- Score: 46.25475369974163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dense retrieval models usually adopt vectors from the last hidden layer of the document encoder to represent a document, which is in contrast to the fact that representations in different layers of a pre-trained language model usually contain different kinds of linguistic knowledge, and behave differently during fine-tuning. Therefore, we propose to investigate utilizing representations from multiple encoder layers to make up the representation of a document, which we denote Multi-layer Representations (MLR). We first investigate how representations in different layers affect MLR's performance under the multi-vector retrieval setting, and then propose to leverage pooling strategies to reduce multi-vector models to single-vector ones to improve retrieval efficiency. Experiments demonstrate the effectiveness of MLR over dual encoder, ME-BERT and ColBERT in the single-vector retrieval setting, as well as demonstrate that it works well with other advanced training techniques such as retrieval-oriented pre-training and hard negative mining.
Related papers
- LEMUR: Learned Multi-Vector Retrieval [9.22384870426709]
We introduce LEMUR, a framework for multi-vector similarity search.<n>LEMUR consists of two consecutive problem reductions.<n>LEMUR is an order of magnitude faster than earlier multi-vector similarity search methods.
arXiv Detail & Related papers (2026-01-29T15:26:32Z) - MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction [13.70527493534928]
We introduce MetaEmbed, a new framework for multimodal retrieval.<n>During training, a fixed number of learnable Meta Tokens are appended to the input sequence.<n>At test-time, their last-layer contextualized representations serve as compact yet expressive multi-vector embeddings.
arXiv Detail & Related papers (2025-09-22T17:59:42Z) - Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.<n>We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.<n>We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - Language-aware Multiple Datasets Detection Pretraining for DETRs [4.939595148195813]
We propose a framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR.
It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model.
We show METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm.
arXiv Detail & Related papers (2023-04-07T10:34:04Z) - CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers.
These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.
We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.