Sequence-to-Sequence Learning for Indonesian Automatic Question
Generator
- URL: http://arxiv.org/abs/2009.13889v1
- Date: Tue, 29 Sep 2020 09:25:54 GMT
- Title: Sequence-to-Sequence Learning for Indonesian Automatic Question
Generator
- Authors: Ferdiant Joshua Muis (1) and Ayu Purwarianti (1 and 2) ((1) Institut
Teknologi Bandung, (2) U-CoE AI-VLB)
- Abstract summary: We construct an Indonesian automatic question generator, adapting the architecture from some previous works.
The system achieved BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L score at 38,35, 20,96, 10,68, 5,78, and 43,4 for SQuAD, and 39.9, 20.78, 10.26, 6.31, 44.13 for TyDiQA.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic question generation is defined as the task of automating the
creation of question given a various of textual data. Research in automatic
question generator (AQG) has been conducted for more than 10 years, mainly
focused on factoid question. In all these studies, the state-of-the-art is
attained using sequence-to-sequence approach. However, AQG system for
Indonesian has not ever been researched intensely. In this work we construct an
Indonesian automatic question generator, adapting the architecture from some
previous works. In summary, we used sequence-to-sequence approach using BiGRU,
BiLSTM, and Transformer with additional linguistic features, copy mechanism,
and coverage mechanism. Since there is no public large dan popular Indonesian
dataset for question generation, we translated SQuAD v2.0 factoid question
answering dataset, with additional Indonesian TyDiQA dev set for testing. The
system achieved BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L score at 38,35, 20,96,
10,68, 5,78, and 43,4 for SQuAD, and 39.9, 20.78, 10.26, 6.31, 44.13 for
TyDiQA, respectively. The system performed well when the expected answers are
named entities and are syntactically close with the context explaining them.
Additionally, from native Indonesian perspective, the best questions generated
by our best models on their best cases are acceptable and reasonably useful.
Related papers
- Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Using Implicit Feedback to Improve Question Generation [4.4250613854221905]
Question Generation (QG) is a task of Natural Language Processing (NLP) that aims at automatically generating questions from text.
In this work, we present a system, GEN, that learns from such (implicit) feedback.
Results show that GEN is able to improve by learning from both levels of implicit feedback when compared to the version with no learning.
arXiv Detail & Related papers (2023-04-26T16:37:47Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Improving Complex Knowledge Base Question Answering via
Question-to-Action and Question-to-Question Alignment [6.646646618666681]
We introduce an alignment-enhanced complex question answering framework, called ALCQA.
We train a question rewriting model to align the question and each action, and utilize a pretrained language model to implicitly align the question and KG artifacts.
We retrieve top-k similar question-answer pairs at the inference stage through question-to-question alignment and propose a novel reward-guided action sequence selection strategy.
arXiv Detail & Related papers (2022-12-26T08:12:41Z) - Generative Language Models for Paragraph-Level Question Generation [79.31199020420827]
Powerful generative models have led to recent progress in question generation (QG)
It is difficult to measure advances in QG research since there are no standardized resources that allow a uniform comparison among approaches.
We introduce QG-Bench, a benchmark for QG that unifies existing question answering datasets by converting them to a standard QG setting.
arXiv Detail & Related papers (2022-10-08T10:24:39Z) - Natural Answer Generation: From Factoid Answer to Full-length Answer
using Grammar Correction [39.40116590327074]
This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer as the input.
A transformer-based Grammar Error Correction model GECToR ( 2020), is used as a post-processing step for better fluency.
We compare our system with (i) Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions.
arXiv Detail & Related papers (2021-12-07T17:39:21Z) - Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering [76.99585451345702]
Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
arXiv Detail & Related papers (2021-10-14T04:36:29Z) - Tell Me How to Ask Again: Question Data Augmentation with Controllable
Rewriting in Continuous Space [94.8320535537798]
Controllable Rewriting based Question Data Augmentation (CRQDA) for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks.
We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples.
arXiv Detail & Related papers (2020-10-04T03:13:46Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Simplifying Paragraph-level Question Generation via Transformer Language
Models [0.0]
Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text.
A single Transformer-based unidirectional language model leveraging transfer learning can be used to produce high quality questions.
Our QG model, finetuned from GPT-2 Small, outperforms several paragraph-level QG baselines on the SQuAD dataset by 0.95 METEOR points.
arXiv Detail & Related papers (2020-05-03T14:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.