Learning Diverse Document Representations with Deep Query Interactions
for Dense Retrieval
- URL: http://arxiv.org/abs/2208.04232v1
- Date: Mon, 8 Aug 2022 16:00:55 GMT
- Title: Learning Diverse Document Representations with Deep Query Interactions
for Dense Retrieval
- Authors: Zehan Li, Nan Yang, Liang Wang, Furu Wei
- Abstract summary: We propose a new dense retrieval model which learns diverse document representations with deep query interactions.
Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations.
- Score: 79.37614949970013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a new dense retrieval model which learns diverse
document representations with deep query interactions. Our model encodes each
document with a set of generated pseudo-queries to get query-informed,
multi-view document representations. It not only enjoys high inference
efficiency like the vanilla dual-encoder models, but also enables deep
query-document interactions in document encoding and provides multi-faceted
representations to better match different queries. Experiments on several
benchmarks demonstrate the effectiveness of the proposed method, out-performing
strong dual encoder baselines.The code is available at
\url{https://github.com/jordane95/dual-cross-encoder
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Multi-View Document Representation Learning for Open-Domain Dense
Retrieval [87.11836738011007]
This paper proposes a multi-view document representation learning framework.
It aims to produce multi-view embeddings to represent documents and enforce them to align with different queries.
Experiments show our method outperforms recent works and achieves state-of-the-art results.
arXiv Detail & Related papers (2022-03-16T03:36:38Z) - Improving Document Representations by Generating Pseudo Query Embeddings
for Dense Retrieval [11.465218502487959]
We design a method to mimic the queries on each of the documents by an iterative clustering process.
We also optimize the matching function with a two-step score calculation procedure.
Experimental results on several popular ranking and QA datasets show that our model can achieve state-of-the-art results.
arXiv Detail & Related papers (2021-05-08T05:28:24Z) - Sparse, Dense, and Attentional Representations for Text Retrieval [25.670835450331943]
Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors.
We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks.
We propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures.
arXiv Detail & Related papers (2020-05-01T02:21:17Z) - Pairwise Multi-Class Document Classification for Semantic Relations
between Wikipedia Articles [5.40541521227338]
We model the problem of finding the relationship between two documents as a pairwise document classification task.
To find semantic relation between documents, we apply a series of techniques, such as GloVe, paragraph-s, BERT, and XLNet.
We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations.
arXiv Detail & Related papers (2020-03-22T12:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.