Summarizing Community-based Question-Answer Pairs
- URL: http://arxiv.org/abs/2211.09892v1
- Date: Thu, 17 Nov 2022 21:09:41 GMT
- Title: Summarizing Community-based Question-Answer Pairs
- Authors: Ting-Yao Hsu, Yoshi Suhara, Xiaolan Wang
- Abstract summary: We propose the novel CQA summarization task that aims to create a concise summary from CQA pairs.
Our data and code are publicly available.
- Score: 5.680726650578754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Community-based Question Answering (CQA), which allows users to acquire their
desired information, has increasingly become an essential component of online
services in various domains such as E-commerce, travel, and dining. However, an
overwhelming number of CQA pairs makes it difficult for users without
particular intent to find useful information spread over CQA pairs. To help
users quickly digest the key information, we propose the novel CQA
summarization task that aims to create a concise summary from CQA pairs. To
this end, we first design a multi-stage data annotation process and create a
benchmark dataset, CoQASUM, based on the Amazon QA corpus. We then compare a
collection of extractive and abstractive summarization methods and establish a
strong baseline approach DedupLED for the CQA summarization task. Our
experiment further confirms two key challenges, sentence-type transfer and
deduplication removal, towards the CQA summarization task. Our data and code
are publicly available.
Related papers
- MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering [3.1651118728570635]
In today's fast-paced industry, professionals face the challenge of summarizing a large number of documents and extracting vital information from them on a daily basis.
To address this challenge, the approach of Table Question Answering (QA) has been developed to extract the relevant information.
Recent advancements in Large Language Models (LLMs) have opened up new possibilities for extracting information from tabular data using prompts.
arXiv Detail & Related papers (2024-03-28T03:14:18Z) - Long-Tailed Question Answering in an Open World [46.67715607552547]
We define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data.
We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks.
On a large-scale OLTQA dataset, our model consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-05-11T04:28:58Z) - PACIFIC: Towards Proactive Conversational Question Answering over
Tabular and Textual Data in Finance [96.06505049126345]
We present a new dataset, named PACIFIC. Compared with existing CQA datasets, PACIFIC exhibits three key features: (i) proactivity, (ii) numerical reasoning, and (iii) hybrid context of tables and text.
A new task is defined accordingly to study Proactive Conversational Question Answering (PCQA), which combines clarification question generation and CQA.
UniPCQA performs multi-task learning over all sub-tasks in PCQA and incorporates a simple ensemble strategy to alleviate the error propagation issue in the multi-task learning by cross-validating top-$k$ sampled Seq2Seq
arXiv Detail & Related papers (2022-10-17T08:06:56Z) - Community Question Answering Entity Linking via Leveraging Auxiliary
Data [7.834536363163232]
We propose a new task of CQA entity linking (CQAEL) as linking the textual entity mentions detected from CQA texts with their corresponding entities in a knowledge base.
Traditional entity linking methods mainly focus on linking entities in news documents.
We propose a novel transformer-based framework to effectively harness the knowledge delivered by different kinds of auxiliary data to promote the linking performance.
arXiv Detail & Related papers (2022-05-24T09:25:18Z) - HeteroQA: Learning towards Question-and-Answering through Multiple
Information Sources via Heterogeneous Graph Modeling [50.39787601462344]
Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests.
Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question.
We propose a question-aware heterogeneous graph transformer to incorporate the multiple information sources (MIS) in the user community to automatically generate the answer.
arXiv Detail & Related papers (2021-12-27T10:16:43Z) - PerCQA: Persian Community Question Answering Dataset [2.503043323723241]
Community Question Answering (CQA) forums provide answers for many real-life questions.
We present PerCQA, the first Persian dataset for CQA.
This dataset contains the questions and answers crawled from the most well-known Persian forum.
arXiv Detail & Related papers (2021-12-25T14:06:41Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.