Passage-Mask: A Learnable Regularization Strategy for Retriever-Reader
Models
- URL: http://arxiv.org/abs/2211.00915v2
- Date: Thu, 3 Nov 2022 08:54:55 GMT
- Title: Passage-Mask: A Learnable Regularization Strategy for Retriever-Reader
Models
- Authors: Shujian Zhang, Chengyue Gong, Xingchao Liu
- Abstract summary: Retriever-reader models achieve competitive performance across many different NLP tasks such as open question answering and dialogue conversations.
We introduce a learnable passage mask mechanism which desensitizes the impact from the top-rank retrieval passages and prevents the model from overfitting.
- Score: 36.58955176223759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retriever-reader models achieve competitive performance across many different
NLP tasks such as open question answering and dialogue conversations. In this
work, we notice these models easily overfit the top-rank retrieval passages and
standard training fails to reason over the entire retrieval passages. We
introduce a learnable passage mask mechanism which desensitizes the impact from
the top-rank retrieval passages and prevents the model from overfitting.
Controlling the gradient variance with fewer mask candidates and selecting the
mask candidates with one-shot bi-level optimization, our learnable
regularization strategy enforces the answer generation to focus on the entire
retrieval passages. Experiments on different tasks across open question
answering, dialogue conversation, and fact verification show that our method
consistently outperforms its baselines. Extensive experiments and ablation
studies demonstrate that our method can be general, effective, and beneficial
for many NLP tasks.
Related papers
- Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models [11.716595438057997]
We propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT)
PSPT is a parameter-efficient method that fine-tunes learnable passage-specific soft prompts.
We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets.
arXiv Detail & Related papers (2024-05-31T07:43:42Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - Unsupervised Candidate Answer Extraction through Differentiable
Masker-Reconstructor Model [21.667471025804936]
We propose a novel unsupervised candidate answer extraction approach that leverages the inherent structure of context passages through a Differentiable Masker-Reconstructor (DMR) Model.
We benchmark a comprehensive set of supervised and unsupervised candidate answer extraction methods.
We demonstrate the effectiveness of the DMR model by showing its performance is superior among unsupervised methods and comparable to supervised methods.
arXiv Detail & Related papers (2023-10-19T19:07:08Z) - Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs [80.74263278847063]
The integration of retrieved passages and large language models (LLMs) has significantly contributed to improving open-domain question answering.
This paper investigates different methods of combining retrieved passages with LLMs to enhance answer generation.
arXiv Detail & Related papers (2023-08-24T05:26:54Z) - Masked Siamese ConvNets [17.337143119620755]
Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks.
Masked siamese networks require particular inductive bias and practically only work well with Vision Transformers.
This work empirically studies the problems behind masked siamese networks with ConvNets.
arXiv Detail & Related papers (2022-06-15T17:52:23Z) - KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive
Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP)
Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem.
Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Joint Passage Ranking for Diverse Multi-Answer Retrieval [56.43443577137929]
We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a question.
This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve passages containing the same answer at the cost of missing a different valid answer.
In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms.
arXiv Detail & Related papers (2021-04-17T04:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.