AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization
- URL: http://arxiv.org/abs/2111.06474v1
- Date: Thu, 11 Nov 2021 21:48:02 GMT
- Title: AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization
- Authors: Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab
- Abstract summary: Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions.
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
This work introduces a novel dataset of 4,631 CQA threads for answer summarization, curated by professional linguists.
- Score: 73.91543616777064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Community Question Answering (CQA) fora such as Stack Overflow and Yahoo!
Answers contain a rich resource of answers to a wide range of community-based
questions. Each question thread can receive a large number of answers with
different perspectives. One goal of answer summarization is to produce a
summary that reflects the range of answer perspectives. A major obstacle for
abstractive answer summarization is the absence of a dataset to provide
supervision for producing such summaries. Recent works propose heuristics to
create such data, but these are often noisy and do not cover all perspectives
present in the answers. This work introduces a novel dataset of 4,631 CQA
threads for answer summarization, curated by professional linguists. Our
pipeline gathers annotations for all subtasks involved in answer summarization,
including the selection of answer sentences relevant to the question, grouping
these sentences based on perspectives, summarizing each perspective, and
producing an overall summary. We analyze and benchmark state-of-the-art models
on these subtasks and introduce a novel unsupervised approach for
multi-perspective data augmentation, that further boosts overall summarization
performance according to automatic evaluation. Finally, we propose
reinforcement learning rewards to improve factual consistency and answer
coverage and analyze areas for improvement.
Related papers
- Aspect-oriented Consumer Health Answer Summarization [2.298110639419913]
Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs.
There can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern.
Our research focuses on aspect-based summarization of health answers to address this limitation.
arXiv Detail & Related papers (2024-05-10T07:52:43Z) - Answering Subjective Induction Questions on Products by Summarizing
Multi-sources Multi-viewpoints Knowledge [0.04791377777154766]
This paper proposes a new task in the field of Answering Subjective Induction Question on Products.
The answer to this kind of question is non-unique, but can be interpreted from many perspectives.
A satisfied answer should be able to summarize these subjective opinions from multiple sources and provide objective knowledge.
arXiv Detail & Related papers (2023-09-12T03:27:08Z) - Concise Answers to Complex Questions: Summarization of Long-form Answers [27.190319030219285]
We conduct a user study on summarized answers generated from state-of-the-art models and our newly proposed extract-and-decontextualize approach.
We find a large proportion of long-form answers can be adequately summarized by at least one system, while complex and implicit answers are challenging to compress.
We observe that decontextualization improves the quality of the extractive summary, exemplifying its potential in the summarization task.
arXiv Detail & Related papers (2023-05-30T17:59:33Z) - MQAG: Multiple-choice Question Answering and Generation for Assessing
Information Consistency in Summarization [55.60306377044225]
State-of-the-art summarization systems can generate highly fluent summaries.
These summaries, however, may contain factual inconsistencies and/or information not present in the source.
We introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared.
arXiv Detail & Related papers (2023-01-28T23:08:25Z) - Answer Consolidation: Formulation and Benchmarking [35.38034364777484]
We formulate the problem of answer consolidation, where answers are partitioned into multiple groups.
A comprehensive and non-redundant set of answers can be constructed by picking one answer from each group.
Despite a promising performance achieved by the best-performing supervised models, we still believe this task has room for further improvements.
arXiv Detail & Related papers (2022-04-29T18:57:23Z) - Summarization with Graphical Elements [55.5913491389047]
We propose a new task: summarization with graphical elements.
We collect a high quality human labeled dataset to support research into the task.
arXiv Detail & Related papers (2022-04-15T17:16:41Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Multi-Perspective Abstractive Answer Summarization [76.10437565615138]
Community Question Answering forums contain a rich resource of answers to a wide range of questions.
The goal of multi-perspective answer summarization is to produce a summary that includes all perspectives of the answer.
This work introduces a novel dataset creation method to automatically create multi-perspective, bullet-point abstractive summaries.
arXiv Detail & Related papers (2021-04-17T13:15:29Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.