From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader
- URL: http://arxiv.org/abs/2212.04755v3
- Date: Mon, 16 Oct 2023 05:45:30 GMT
- Title: From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader
- Authors: Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong
Bing
- Abstract summary: Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data.
PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
- Score: 130.45769668885487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Pre-trained Machine Reader (PMR), a novel method for retrofitting
pre-trained masked language models (MLMs) to pre-trained machine reading
comprehension (MRC) models without acquiring labeled data. PMR can resolve the
discrepancy between model pre-training and downstream fine-tuning of existing
MLMs. To build the proposed PMR, we constructed a large volume of
general-purpose and high-quality MRC-style training data by using Wikipedia
hyperlinks and designed a Wiki Anchor Extraction task to guide the MRC-style
pre-training. Apart from its simplicity, PMR effectively solves extraction
tasks, such as Extractive Question Answering and Named Entity Recognition. PMR
shows tremendous improvements over existing approaches, especially in
low-resource scenarios. When applied to the sequence classification task in the
MRC formulation, PMR enables the extraction of high-quality rationales to
explain the classification process, thereby providing greater prediction
explainability. PMR also has the potential to serve as a unified model for
tackling various extraction and classification tasks in the MRC formulation.
Related papers
- Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - MetaRM: Shifted Distributions Alignment via Meta-Learning [52.94381279744458]
Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM)
We introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution.
Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization.
arXiv Detail & Related papers (2024-05-01T10:43:55Z) - Generating Query Focused Summaries without Fine-tuning the
Transformer-based Pre-trained Models [0.6124773188525718]
Fine-tuning the Natural Language Processing (NLP) models for each new data set requires higher computational time associated with increased carbon footprint and cost.
In this paper, we try to omit the fine-tuning steps and investigate whether the Marginal Maximum Relevance (MMR)-based approach can help the pre-trained models to obtain query-focused summaries directly from a new data set that was not used to pre-train the models.
As indicated by the experimental results, our MMR-based approach successfully ranked and selected the most relevant sentences as summaries and showed better performance than the individual pre-trained models.
arXiv Detail & Related papers (2023-03-10T22:40:15Z) - Resolving quantitative MRI model degeneracy with machine learning via
training data distribution design [2.2086005010186387]
Quantitative MRI aims to map tissue properties non-invasively via models that relate unknown quantities to measured MRI signals.
Estimating these unknowns, which has traditionally required model fitting, can now be done with one-shot machine learning (ML) approaches.
Growing empirical evidence appears to suggest ML approaches remain susceptible to model degeneracy.
arXiv Detail & Related papers (2023-03-09T18:10:45Z) - Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine.
We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis
of Head and Prompt Tuning [66.44344616836158]
We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text.
We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM.
arXiv Detail & Related papers (2021-06-17T03:31:47Z) - REPT: Bridging Language Models and Machine Reading Comprehensionvia
Retrieval-Based Pre-training [45.21249008835556]
We present REPT, a REtrieval-based Pre-Training approach to bridge the gap between general PLMs and MRC.
In particular, we introduce two self-supervised tasks to strengthen evidence extraction during pre-training.
Our approach is able to enhance the capacity of evidence extraction without explicit supervision.
arXiv Detail & Related papers (2021-05-10T08:54:46Z) - Improving AMR Parsing with Sequence-to-Sequence Pre-training [39.33133978535497]
In this paper, we focus on sequence-to-sequence (seq2seq) AMR parsing.
We propose a seq2seq pre-training approach to build pre-trained models in both single and joint way.
Experiments show that both the single and joint pre-trained models significantly improve the performance.
arXiv Detail & Related papers (2020-10-05T04:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.