Related papers: Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

URL: http://arxiv.org/abs/2203.06942v1
Date: Mon, 14 Mar 2022 09:09:49 GMT
Title: Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering
Authors: Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen
Abstract summary: HyperLink-induced Pre-training (HLP) is a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training.
Score: 53.381467950545606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training that better facilitate downstream passage retrieval. We investigate the effectiveness of our approach across a wide range of open-domain QA datasets under zero-shot, few-shot, multi-hop, and out-of-domain scenarios. The experiments show our HLP outperforms the BM25 by up to 7 points as well as other pre-training methods by more than 10 points in terms of top-20 retrieval accuracy under the zero-shot scenario. Furthermore, HLP significantly outperforms other pre-training methods under the other scenarios.

Related papers

A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation [16.82426251068573]
Link Prediction (LP) is a critical task in graph machine learning.<n>Existing methods face key challenges including limited supervision from sparse connectivity.<n>We explore pretraining as a solution to address these challenges.
arXiv Detail & Related papers (2025-08-06T17:10:31Z)
PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction [30.59030967261011]
Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative performance of actions in long videos.<n>Existing methods face challenges due to domain shifts between the pre-trained large-scale action recognition backbones and the specific AQA task, thereby hindering their performance.<n>We address this by identifying two levels of domain shift: task-level, regarding differences in task objectives, and feature-level, regarding differences in important features.
arXiv Detail & Related papers (2025-05-26T13:34:46Z)
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits [59.30310692855397]
We propose a unified framework for the RLHF pipeline from the view of contextual bandits. We decompose the RLHF process into two distinct stages: (post-)training and deployment. We then develop novel algorithms for each stage, demonstrating significant improvements in both statistical and computational efficiency.
arXiv Detail & Related papers (2025-02-11T02:36:01Z)
Improve Dense Passage Retrieval with Entailment Tuning [22.39221206192245]
Key to a retrieval system is to calculate relevance scores to query and passage pairs. We observed that a major class of relevance aligns with the concept of entailment in NLI tasks. We design a method called entailment tuning to improve the embedding of dense retrievers.
arXiv Detail & Related papers (2024-10-21T09:18:30Z)
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
We propose W-RAG, a method that draws weak training signals from the downstream task and fine-tunes the retriever to prioritize passages that most benefit the task. We conduct comprehensive experiments across four publicly available OpenQA datasets to demonstrate that our approach enhances both retrieval and OpenQA performance.
arXiv Detail & Related papers (2024-08-15T22:34:44Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning [49.92394599459274]
We propose PriorBand, an HPO algorithm tailored to Deep Learning (DL) pipelines. We show its robustness across a range of DL benchmarks and show its gains under informative expert input and against poor expert beliefs.
arXiv Detail & Related papers (2023-06-21T16:26:14Z)
Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization [8.93957397187611]
Differentiable architecture search (DARTS) is a promising end to end NAS method. Recent studies cast doubt on the basic underlying hypotheses of DARTS. We propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS.
arXiv Detail & Related papers (2023-06-12T04:11:37Z)
Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training [81.3781338418574]
We propose relevance-aware contrastive learning. We consistently improve the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks. Our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner.
arXiv Detail & Related papers (2023-06-05T18:20:27Z)
Causal Document-Grounded Dialogue Pre-training [81.16429056652483]
We present a causally-complete dataset construction strategy for building million-level DocGD pre-training corpora. Experiments on three benchmark datasets demonstrate that our causal pre-training achieves considerable and consistent improvements under fully-supervised, low-resource, few-shot, and zero-shot settings.
arXiv Detail & Related papers (2023-05-18T12:39:25Z)
Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score. We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z)
Query-as-context Pre-training for Dense Passage Retrieval [27.733665432319803]
Methods have been developed to improve the performance of dense passage retrieval by using context-supervised pre-training. This paper proposes query-as-context pre-training, a simple yet effective pre-training technique to alleviate the issue.
arXiv Detail & Related papers (2022-12-19T16:34:19Z)
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning [10.457660611114457]
We show how to select between policies and value functions produced by different training algorithms in offline reinforcement learning. We use BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari.
arXiv Detail & Related papers (2021-10-26T20:12:11Z)
Few-Shot Bayesian Optimization with Deep Kernel Surrogates [7.208515071018781]
We propose a few-shot learning problem in which we train a shared deep surrogate model to adapt to the response function of a new task. We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion. As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO.
arXiv Detail & Related papers (2021-01-19T15:00:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.