S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational
Question Answering
- URL: http://arxiv.org/abs/2312.16511v1
- Date: Wed, 27 Dec 2023 10:41:18 GMT
- Title: S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational
Question Answering
- Authors: Baokui Li, Sen Zhang, Wangshu Zhang, Yicheng Chen, Changlin Yang, Sen
Hu, Teng Xu, Siye liu, Jiwei Li
- Abstract summary: We propose a novel method to convert single-turn datasets to multi-turn datasets.
S2M ranks 1st place on the QuAC leaderboard at the time of submission.
- Score: 16.930522435912717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supplying data augmentation to conversational question answering (CQA) can
effectively improve model performance. However, there is less improvement from
single-turn datasets in CQA due to the distribution gap between single-turn and
multi-turn datasets. On the other hand, while numerous single-turn datasets are
available, we have not utilized them effectively. To solve this problem, we
propose a novel method to convert single-turn datasets to multi-turn datasets.
The proposed method consists of three parts, namely, a QA pair Generator, a QA
pair Reassembler, and a question Rewriter. Given a sample consisting of context
and single-turn QA pairs, the Generator obtains candidate QA pairs and a
knowledge graph based on the context. The Reassembler utilizes the knowledge
graph to get sequential QA pairs, and the Rewriter rewrites questions from a
conversational perspective to obtain a multi-turn dataset S2M. Our experiments
show that our method can synthesize effective training resources for CQA.
Notably, S2M ranks 1st place on the QuAC leaderboard at the time of submission
(Aug 24th, 2022).
Related papers
- A Lightweight Method to Generate Unanswerable Questions in English [18.323248259867356]
We examine a simpler data augmentation method for unanswerable question generation in English.
We perform antonym and entity swaps on answerable questions.
Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
arXiv Detail & Related papers (2023-10-30T10:14:52Z) - QASnowball: An Iterative Bootstrapping Framework for High-Quality
Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball)
QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples.
We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - HeteroQA: Learning towards Question-and-Answering through Multiple
Information Sources via Heterogeneous Graph Modeling [50.39787601462344]
Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests.
Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question.
We propose a question-aware heterogeneous graph transformer to incorporate the multiple information sources (MIS) in the user community to automatically generate the answer.
arXiv Detail & Related papers (2021-12-27T10:16:43Z) - Generating Self-Contained and Summary-Centric Question Answer Pairs via
Differentiable Reward Imitation Learning [7.2745835227138045]
We propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers.
This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions.
arXiv Detail & Related papers (2021-09-10T06:34:55Z) - Tell Me How to Ask Again: Question Data Augmentation with Controllable
Rewriting in Continuous Space [94.8320535537798]
Controllable Rewriting based Question Data Augmentation (CRQDA) for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks.
We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples.
arXiv Detail & Related papers (2020-10-04T03:13:46Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.