DIFFQG: Generating Questions to Summarize Factual Changes
- URL: http://arxiv.org/abs/2303.00242v1
- Date: Wed, 1 Mar 2023 05:45:48 GMT
- Title: DIFFQG: Generating Questions to Summarize Factual Changes
- Authors: Jeremy R. Cole, Palak Jain, Julian Martin Eisenschlos, Michael J.Q.
Zhang, Eunsol Choi, Bhuwan Dhingra
- Abstract summary: We propose representing factual changes between paired documents as question-answer pairs.
DIFFQG consists of 759 QA pairs and 1153 examples of paired passages with no factual change.
- Score: 41.142542919449355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying the difference between two versions of the same article is useful
to update knowledge bases and to understand how articles evolve. Paired texts
occur naturally in diverse situations: reporters write similar news stories and
maintainers of authoritative websites must keep their information up to date.
We propose representing factual changes between paired documents as
question-answer pairs, where the answer to the same question differs between
two versions. We find that question-answer pairs can flexibly and concisely
capture the updated contents. Provided with paired documents, annotators
identify questions that are answered by one passage but answered differently or
cannot be answered by the other. We release DIFFQG which consists of 759 QA
pairs and 1153 examples of paired passages with no factual change. These
questions are intended to be both unambiguous and information-seeking and
involve complex edits, pushing beyond the capabilities of current question
generation and factual change detection systems. Our dataset summarizes the
changes between two versions of the document as questions and answers, studying
automatic update summarization in a novel way.
Related papers
- Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Discord Questions: A Computational Approach To Diversity Analysis in
News Coverage [84.55145223950427]
We propose a new framework to assist readers in identifying source differences and gaining an understanding of news coverage diversity.
The framework is based on the generation of Discord Questions: questions with a diverse answer pool.
arXiv Detail & Related papers (2022-11-09T16:37:55Z) - Discourse Analysis via Questions and Answers: Parsing Dependency
Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis.
We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions.
We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z) - Investigating Information Inconsistency in Multilingual Open-Domain
Question Answering [18.23417521199809]
We analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias.
We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
arXiv Detail & Related papers (2022-05-25T02:58:54Z) - Discourse Comprehension: A Question Answering Framework to Represent
Sentence Connections [35.005593397252746]
A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data.
This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents.
The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
arXiv Detail & Related papers (2021-11-01T04:50:26Z) - Challenges in Information-Seeking QA: Unanswerable Questions and
Paragraph Retrieval [46.3246135936476]
We analyze why answering information-seeking queries is more challenging and where their prevalent unanswerabilities arise.
Our controlled experiments suggest two headrooms -- paragraph selection and answerability prediction.
We manually annotate 800 unanswerable examples across six languages on what makes them challenging to answer.
arXiv Detail & Related papers (2020-10-22T17:48:17Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Match$^2$: A Matching over Matching Model for Similar Question
Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers.
Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked.
It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions.
Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z) - Guided Transformer: Leveraging Multiple External Sources for
Representation Learning in Conversational Search [36.64582291809485]
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems.
In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources.
Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
arXiv Detail & Related papers (2020-06-13T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.