Enhancing Pre-trained Models with Text Structure Knowledge for Question
Generation
- URL: http://arxiv.org/abs/2209.04179v1
- Date: Fri, 9 Sep 2022 08:33:47 GMT
- Title: Enhancing Pre-trained Models with Text Structure Knowledge for Question
Generation
- Authors: Zichen Wu, Xin Jia, Fanyi Qu, Yunfang Wu (Key Laboratory of
Computational Linguistics, Ministry of Education, China, School of Computer
Science, Peking University, China)
- Abstract summary: We model text structure as answer position and syntactic dependency, and propose answer localness modeling and syntactic mask attention to address these limitations.
Experiments on SQuAD dataset show that our proposed two modules improve performance over the strong pre-trained model ProphetNet.
- Score: 2.526624977753083
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Today the pre-trained language models achieve great success for question
generation (QG) task and significantly outperform traditional
sequence-to-sequence approaches. However, the pre-trained models treat the
input passage as a flat sequence and are thus not aware of the text structure
of input passage. For QG task, we model text structure as answer position and
syntactic dependency, and propose answer localness modeling and syntactic mask
attention to address these limitations. Specially, we present localness
modeling with a Gaussian bias to enable the model to focus on answer-surrounded
context, and propose a mask attention mechanism to make the syntactic structure
of input passage accessible in question generation process. Experiments on
SQuAD dataset show that our proposed two modules improve performance over the
strong pre-trained model ProphetNet, and combing them together achieves very
competitive results with the state-of-the-art pre-trained model.
Related papers
- Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Zero-shot Visual Question Answering with Language Model Feedback [83.65140324876536]
We propose a language model guided captioning approach, LAMOC, for knowledge-based visual question answering (VQA)
Our approach employs the generated captions by a captioning model as the context of an answer prediction model, which is a Pre-trained Language model (PLM)
arXiv Detail & Related papers (2023-05-26T15:04:20Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Tracing Origins: Coref-aware Machine Reading Comprehension [43.352833140317486]
We imitated the human's reading process in connecting the anaphoric expressions and leverage the coreference information to enhance the word embeddings from the pre-trained model.
We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.
arXiv Detail & Related papers (2021-10-15T09:28:35Z) - OCHADAI-KYODAI at SemEval-2021 Task 1: Enhancing Model Generalization
and Robustness for Lexical Complexity Prediction [8.066349353140819]
We propose an ensemble model for predicting the lexical complexity of words and multiword expressions.
The model receives as input a sentence with a target word or MWEand outputs its complexity score.
Our model achieved competitive results and ranked among the top-10 systems in both sub-tasks.
arXiv Detail & Related papers (2021-05-12T09:27:46Z) - Unlocking Compositional Generalization in Pre-trained Models Using
Intermediate Representations [27.244943870086175]
Sequence-to-sequence (seq2seq) models have been found to struggle at out-of-distribution compositional generalization.
We study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models.
arXiv Detail & Related papers (2021-04-15T14:15:14Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Enriched Pre-trained Transformers for Joint Slot Filling and Intent
Detection [22.883725214057286]
In this paper, we propose a novel architecture for learning intent-based language models.
We propose an intent pooling attention mechanism, and we reinforce the slot filling task by fusing intent distributions, word features, and token representations.
The experimental results on standard datasets show that our model outperforms both the current non-BERT state of the art as well as some stronger BERT-based baselines.
arXiv Detail & Related papers (2020-04-30T15:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.