Hyperlink-induced Pre-training for Passage Retrieval in Open-domain
Question Answering
- URL: http://arxiv.org/abs/2203.06942v1
- Date: Mon, 14 Mar 2022 09:09:49 GMT
- Title: Hyperlink-induced Pre-training for Passage Retrieval in Open-domain
Question Answering
- Authors: Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu,
Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen
- Abstract summary: HyperLink-induced Pre-training (HLP) is a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents.
We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training.
- Score: 53.381467950545606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To alleviate the data scarcity problem in training question answering
systems, recent works propose additional intermediate pre-training for dense
passage retrieval (DPR). However, there still remains a large discrepancy
between the provided upstream signals and the downstream question-passage
relevance, which leads to less improvement. To bridge this gap, we propose the
HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever
with the text relevance induced by hyperlink-based topology within Web
documents. We demonstrate that the hyperlink-based structures of dual-link and
co-mention can provide effective relevance signals for large-scale pre-training
that better facilitate downstream passage retrieval. We investigate the
effectiveness of our approach across a wide range of open-domain QA datasets
under zero-shot, few-shot, multi-hop, and out-of-domain scenarios. The
experiments show our HLP outperforms the BM25 by up to 7 points as well as
other pre-training methods by more than 10 points in terms of top-20 retrieval
accuracy under the zero-shot scenario. Furthermore, HLP significantly
outperforms other pre-training methods under the other scenarios.
Related papers
- Improve Dense Passage Retrieval with Entailment Tuning [22.39221206192245]
Key to a retrieval system is to calculate relevance scores to query and passage pairs.
We observed that a major class of relevance aligns with the concept of entailment in NLI tasks.
We design a method called entailment tuning to improve the embedding of dense retrievers.
arXiv Detail & Related papers (2024-10-21T09:18:30Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - PriorBand: Practical Hyperparameter Optimization in the Age of Deep
Learning [49.92394599459274]
We propose PriorBand, an HPO algorithm tailored to Deep Learning (DL) pipelines.
We show its robustness across a range of DL benchmarks and show its gains under informative expert input and against poor expert beliefs.
arXiv Detail & Related papers (2023-06-21T16:26:14Z) - Robustifying DARTS by Eliminating Information Bypass Leakage via
Explicit Sparse Regularization [8.93957397187611]
Differentiable architecture search (DARTS) is a promising end to end NAS method.
Recent studies cast doubt on the basic underlying hypotheses of DARTS.
We propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS.
arXiv Detail & Related papers (2023-06-12T04:11:37Z) - Unsupervised Dense Retrieval with Relevance-Aware Contrastive
Pre-Training [81.3781338418574]
We propose relevance-aware contrastive learning.
We consistently improve the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks.
Our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner.
arXiv Detail & Related papers (2023-06-05T18:20:27Z) - Causal Document-Grounded Dialogue Pre-training [81.16429056652483]
We present a causally-complete dataset construction strategy for building million-level DocGD pre-training corpora.
Experiments on three benchmark datasets demonstrate that our causal pre-training achieves considerable and consistent improvements under fully-supervised, low-resource, few-shot, and zero-shot settings.
arXiv Detail & Related papers (2023-05-18T12:39:25Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - Query-as-context Pre-training for Dense Passage Retrieval [27.733665432319803]
Methods have been developed to improve the performance of dense passage retrieval by using context-supervised pre-training.
This paper proposes query-as-context pre-training, a simple yet effective pre-training technique to alleviate the issue.
arXiv Detail & Related papers (2022-12-19T16:34:19Z) - Towards Hyperparameter-free Policy Selection for Offline Reinforcement
Learning [10.457660611114457]
We show how to select between policies and value functions produced by different training algorithms in offline reinforcement learning.
We use BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari.
arXiv Detail & Related papers (2021-10-26T20:12:11Z) - Few-Shot Bayesian Optimization with Deep Kernel Surrogates [7.208515071018781]
We propose a few-shot learning problem in which we train a shared deep surrogate model to adapt to the response function of a new task.
We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion.
As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO.
arXiv Detail & Related papers (2021-01-19T15:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.