Exploring Question-Specific Rewards for Generating Deep Questions
- URL: http://arxiv.org/abs/2011.01102v1
- Date: Mon, 2 Nov 2020 16:37:30 GMT
- Title: Exploring Question-Specific Rewards for Generating Deep Questions
- Authors: Yuxi Xie, Liangming Pan, Dongzhe Wang, Min-Yen Kan, Yansong Feng
- Abstract summary: We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions.
We find that optimizing question-specific rewards generally leads to better performance in automatic evaluation metrics.
- Score: 42.243227323241584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent question generation (QG) approaches often utilize the
sequence-to-sequence framework (Seq2Seq) to optimize the log-likelihood of
ground-truth questions using teacher forcing. However, this training objective
is inconsistent with actual question quality, which is often reflected by
certain global properties such as whether the question can be answered by the
document. As such, we directly optimize for QG-specific objectives via
reinforcement learning to improve question quality. We design three different
rewards that target to improve the fluency, relevance, and answerability of
generated questions. We conduct both automatic and human evaluations in
addition to a thorough analysis to explore the effect of each QG-specific
reward. We find that optimizing question-specific rewards generally leads to
better performance in automatic evaluation metrics. However, only the rewards
that correlate well with human judgement (e.g., relevance) lead to real
improvement in question quality. Optimizing for the others, especially
answerability, introduces incorrect bias to the model, resulting in poor
question quality. Our code is publicly available at
https://github.com/YuxiXie/RL-for-Question-Generation.
Related papers
- Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter [17.736962215696366]
We introduce single-round instance-level prompt optimization, referred to as question rewriter.
By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers.
arXiv Detail & Related papers (2024-08-20T06:24:47Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - KEPR: Knowledge Enhancement and Plausibility Ranking for Generative
Commonsense Question Answering [11.537283115693432]
We propose a Knowledge Enhancement and Plausibility Ranking approach grounded on the Generate-Then-Rank pipeline architecture.
Specifically, we expand questions in terms of Wiktionary commonsense knowledge of keywords, and reformulate them with normalized patterns.
We develop an ELECTRA-based answer ranking model, where logistic regression is conducted during training, with the aim of approxing different levels of plausibility.
arXiv Detail & Related papers (2023-05-15T04:58:37Z) - Synthetic Question Value Estimation for Domain Adaptation of Question
Answering [31.003053719921628]
We introduce a novel idea of training a question value estimator (QVE) that directly estimates the usefulness of synthetic questions for improving the target-domain QA performance.
By using such questions and only around 15% of the human annotations on the target domain, we can achieve comparable performance to the fully-supervised baselines.
arXiv Detail & Related papers (2022-03-16T20:22:31Z) - Improving the Question Answering Quality using Answer Candidate
Filtering based on Natural-Language Features [117.44028458220427]
We address the problem of how the Question Answering (QA) quality of a given system can be improved.
Our main contribution is an approach capable of identifying wrong answers provided by a QA system.
In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers.
arXiv Detail & Related papers (2021-12-10T11:09:44Z) - MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for
Answer Selection [59.95429407899612]
We propose a novel reinforcement learning based multi-step ranking model, named MS-Ranker.
We explicitly consider the potential correctness of candidates and update the evidence with a gating mechanism.
Our model significantly outperforms existing methods that do not rely on external resources.
arXiv Detail & Related papers (2020-10-10T10:36:58Z) - Towards Automatic Generation of Questions from Long Answers [11.198653485869935]
We propose a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers.
We empirically demonstrate that the performance of existing AQG methods significantly degrades as the length of the answer increases.
Transformer-based methods outperform other existing AQG methods on long answers in terms of automatic as well as human evaluation.
arXiv Detail & Related papers (2020-04-10T16:45:08Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.