Inflected Forms Are Redundant in Question Generation Models
- URL: http://arxiv.org/abs/2301.00397v1
- Date: Sun, 1 Jan 2023 13:08:11 GMT
- Title: Inflected Forms Are Redundant in Question Generation Models
- Authors: Xingwu Sun, Hongyin Tang, chengzhong Xu
- Abstract summary: We propose an approach to enhance the performance of Question Generation using an encoder-decoder framework.
Firstly, we identify the inflected forms of words from the input of encoder, and replace them with the root words.
Secondly, we propose to adapt QG as a combination of the following actions in the encode-decoder framework: generating a question word, copying a word from the source sequence or generating a word transformation type.
- Score: 27.49894653349779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural models with an encoder-decoder framework provide a feasible solution
to Question Generation (QG). However, after analyzing the model vocabulary we
find that current models (both RNN-based and pre-training based) have more than
23\% inflected forms. As a result, the encoder will generate separate
embeddings for the inflected forms, leading to a waste of training data and
parameters. Even worse, in decoding these models are vulnerable to irrelevant
noise and they suffer from high computational costs. In this paper, we propose
an approach to enhance the performance of QG by fusing word transformation.
Firstly, we identify the inflected forms of words from the input of encoder,
and replace them with the root words, letting the encoder pay more attention to
the repetitive root words. Secondly, we propose to adapt QG as a combination of
the following actions in the encode-decoder framework: generating a question
word, copying a word from the source sequence or generating a word
transformation type. Such extension can greatly decrease the size of predicted
words in the decoder as well as noise. We apply our approach to a typical
RNN-based model and \textsc{UniLM} to get the improved versions. We conduct
extensive experiments on SQuAD and MS MARCO datasets. The experimental results
show that the improved versions can significantly outperform the corresponding
baselines in terms of BLEU, ROUGE-L and METEOR as well as time cost.
Related papers
- Multi-Granularity Guided Fusion-in-Decoder [7.87348193562399]
We propose the Multi-Granularity guided Fusion-in-Decoder (MGFiD) to discerning evidence across multiple levels of granularity.
Based on multi-task learning, MGFiD harmonizes passage re-ranking with sentence classification.
It improves decoding efficiency by reusing the results of passage re-ranking for passage pruning.
arXiv Detail & Related papers (2024-04-03T08:56:00Z) - A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
Understanding and Generation [95.49128988683191]
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models.
We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2.
E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
arXiv Detail & Related papers (2022-05-30T08:25:36Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style
Word Generator [61.70748716353692]
Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog.
Existing systems for this task employ the transformers or recurrent neural network-based architecture with the encoder-decoder framework.
We propose a Multimodal Semantic Transformer Network. It employs a transformer-based architecture with an attention-based word embedding layer that generates words by querying word embeddings.
arXiv Detail & Related papers (2020-04-01T07:10:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.