On the Evaluation of Answer-Agnostic Paragraph-level Multi-Question
Generation
- URL: http://arxiv.org/abs/2203.04464v2
- Date: Fri, 11 Mar 2022 15:55:59 GMT
- Title: On the Evaluation of Answer-Agnostic Paragraph-level Multi-Question
Generation
- Authors: Jishnu Ray Chowdhury, Debanjan Mahata, Cornelia Caragea
- Abstract summary: We study the task of predicting a set of salient questions from a given paragraph without any prior knowledge of the precise answer.
First, we propose a new method to evaluate a set of predicted questions against the set of references by using the Hungarian algorithm to assign predicted questions to references before scoring the assigned pairs.
Second, we compare different strategies to utilize a pre-trained seq2seq model to generate and select a set of questions related to a given paragraph.
- Score: 57.630606799713526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the task of predicting a set of salient questions from a given
paragraph without any prior knowledge of the precise answer. We make two main
contributions. First, we propose a new method to evaluate a set of predicted
questions against the set of references by using the Hungarian algorithm to
assign predicted questions to references before scoring the assigned pairs. We
show that our proposed evaluation strategy has better theoretical and practical
properties compared to prior methods because it can properly account for the
coverage of references. Second, we compare different strategies to utilize a
pre-trained seq2seq model to generate and select a set of questions related to
a given paragraph. The code is available.
Related papers
- QUDSELECT: Selective Decoding for Questions Under Discussion Parsing [90.92351108691014]
Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences.
We introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria.
Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation.
arXiv Detail & Related papers (2024-08-02T06:46:08Z) - Do Smaller Language Models Answer Contextualised Questions Through
Memorisation Or Generalisation? [8.51696622847778]
A distinction is often drawn between a model's ability to predict a label for an evaluation sample that is directly memorised from highly similar training samples.
We propose a method of identifying evaluation samples for which it is very unlikely our model would have memorised the answers.
arXiv Detail & Related papers (2023-11-21T04:06:08Z) - Evaluation of Question Generation Needs More References [7.876222232341623]
We propose to paraphrase the reference question for a more robust QG evaluation.
Using large language models such as GPT-3, we created semantically and syntactically diverse questions.
arXiv Detail & Related papers (2023-05-26T04:40:56Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Knowledge-enhanced Iterative Instruction Generation and Reasoning for
Knowledge Base Question Answering [43.72266327778216]
Multi-hop Knowledge Base Question Answering aims to find the answer entity in a knowledge base which is several hops from the topic entity mentioned in the question.
Existing Retrieval-based approaches first generate instructions from the question and then use them to guide the multi-hop reasoning on the knowledge graph.
We do experiments on two multi-hop KBQA benchmarks and outperform the existing approaches, becoming the new-state-of-the-art.
arXiv Detail & Related papers (2022-09-07T09:02:45Z) - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
Reasoning Strategies [78.68534915690404]
StrategyQA is a benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy.
We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts.
Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
arXiv Detail & Related papers (2021-01-06T19:14:23Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Visual Question Answering with Prior Class Semantics [50.845003775809836]
We show how to exploit additional information pertaining to the semantics of candidate answers.
We extend the answer prediction process with a regression objective in a semantic space.
Our method brings improvements in consistency and accuracy over a range of question types.
arXiv Detail & Related papers (2020-05-04T02:46:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.