Related papers: Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation

Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation

URL: http://arxiv.org/abs/2009.13815v1
Date: Tue, 29 Sep 2020 07:02:19 GMT
Title: Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
Authors: Yinfei Yang, Ning Jin, Kuo Lin, Mandy Guo, Daniel Cer
Abstract summary: Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers. We present a supervised data mining method using an accurate early fusion model to improve the training of an efficient late fusion retrieval model.
Score: 14.669454236593447
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural models that independently project questions and answers into a shared embedding space allow for efficient continuous space retrieval from large corpora. Independently computing embeddings for questions and answers results in late fusion of information related to matching questions to their answers. While critical for efficient retrieval, late fusion underperforms models that make use of early fusion (e.g., a BERT based classifier with cross-attention between question-answer pairs). We present a supervised data mining method using an accurate early fusion model to improve the training of an efficient late fusion retrieval model. We first train an accurate classification model with cross-attention between questions and answers. The accurate cross-attention model is then used to annotate additional passages in order to generate weighted training examples for a neural retrieval model. The resulting retrieval model with additional data significantly outperforms retrieval models directly trained with gold annotations on Precision at $N$ (P@N) and Mean Reciprocal Rank (MRR).

Related papers

Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues. We propose a novel approach to address this issue at test time without requiring retraining. MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries. Experimental results show that our method improves consistently over existing methods. Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
Generating Query Focused Summaries without Fine-tuning the Transformer-based Pre-trained Models [0.6124773188525718]
Fine-tuning the Natural Language Processing (NLP) models for each new data set requires higher computational time associated with increased carbon footprint and cost. In this paper, we try to omit the fine-tuning steps and investigate whether the Marginal Maximum Relevance (MMR)-based approach can help the pre-trained models to obtain query-focused summaries directly from a new data set that was not used to pre-train the models. As indicated by the experimental results, our MMR-based approach successfully ranked and selected the most relevant sentences as summaries and showed better performance than the individual pre-trained models.
arXiv Detail & Related papers (2023-03-10T22:40:15Z)
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work. Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z)
Kronecker Factorization for Preventing Catastrophic Forgetting in Large-scale Medical Entity Linking [7.723047334864811]
In the medical domain, sequential training on tasks may sometimes be the only way to train models. catastrophic forgetting, i.e., a substantial drop in accuracy on prior tasks when a model is updated for a new task. We show the effectiveness of this technique on the important and illustrative task of medical entity linking across three datasets.
arXiv Detail & Related papers (2021-11-11T01:51:01Z)
A Relational Model for One-Shot Classification [80.77724423309184]
We show that a deep learning model with built-in inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation. The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention.
arXiv Detail & Related papers (2021-11-08T07:53:12Z)
S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation. For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z)
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension [27.538957000237176]
Humans create questions adversarially, such that the model fails to answer them correctly. We collect 36,000 samples with progressively stronger models in the annotation loop. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets. We find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop.
arXiv Detail & Related papers (2020-02-02T00:22:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.