REPT: Bridging Language Models and Machine Reading Comprehensionvia
Retrieval-Based Pre-training
- URL: http://arxiv.org/abs/2105.04201v1
- Date: Mon, 10 May 2021 08:54:46 GMT
- Title: REPT: Bridging Language Models and Machine Reading Comprehensionvia
Retrieval-Based Pre-training
- Authors: Fangkai Jiao, Yangyang Guo, Yilin Niu, Feng Ji, Feng-Lin Li, Liqiang
Nie
- Abstract summary: We present REPT, a REtrieval-based Pre-Training approach to bridge the gap between general PLMs and MRC.
In particular, we introduce two self-supervised tasks to strengthen evidence extraction during pre-training.
Our approach is able to enhance the capacity of evidence extraction without explicit supervision.
- Score: 45.21249008835556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained Language Models (PLMs) have achieved great success on Machine
Reading Comprehension (MRC) over the past few years. Although the general
language representation learned from large-scale corpora does benefit MRC, the
poor support in evidence extraction which requires reasoning across multiple
sentences hinders PLMs from further advancing MRC. To bridge the gap between
general PLMs and MRC, we present REPT, a REtrieval-based Pre-Training approach.
In particular, we introduce two self-supervised tasks to strengthen evidence
extraction during pre-training, which is further inherited by downstream MRC
tasks through the consistent retrieval operation and model architecture. To
evaluate our proposed method, we conduct extensive experiments on five MRC
datasets that require collecting evidence from and reasoning across multiple
sentences. Experimental results demonstrate the effectiveness of our
pre-training approach. Moreover, further analysis shows that our approach is
able to enhance the capacity of evidence extraction without explicit
supervision.
Related papers
- Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Let's Reinforce Step by Step [10.65244642965387]
We use Reinforcement Learning from Human Feedback to shape model reasoning processes.
Our results show that the fine-grained reward provided by PRM-based methods enhances accuracy on simple mathematical reasoning.
We also show the critical role reward aggregation functions play in model performance.
arXiv Detail & Related papers (2023-11-10T01:35:51Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data.
PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z) - Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance.
We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process.
Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z) - Bridging the Gap between Language Model and Reading Comprehension:
Unsupervised MRC via Self-Supervision [34.01738910736325]
We propose a new framework for unsupervised machine reading comprehension (MRC)
We learn to spot answer spans in documents via self-supervised learning, by designing a self-supervision pretext task for MRC - Spotting-MLM.
Experiments show that our method achieves a new state-of-the-art performance for unsupervised MRC.
arXiv Detail & Related papers (2021-07-19T02:14:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.