KGConv, a Conversational Corpus grounded in Wikidata
- URL: http://arxiv.org/abs/2308.15298v1
- Date: Tue, 29 Aug 2023 13:35:51 GMT
- Title: KGConv, a Conversational Corpus grounded in Wikidata
- Authors: Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire
Gardent
- Abstract summary: KGConv is a large, conversational corpus of 71k conversations grounded in a Wikidata fact.
We provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model.
KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.
- Score: 6.451914896767135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present KGConv, a large, conversational corpus of 71k conversations where
each question-answer pair is grounded in a Wikidata fact. Conversations contain
on average 8.6 questions and for each Wikidata fact, we provide multiple
variants (12 on average) of the corresponding question using templates, human
annotations, hand-crafted rules and a question rewriting neural model. We
provide baselines for the task of Knowledge-Based, Conversational Question
Generation. KGConv can further be used for other generation and analysis tasks
such as single-turn question generation from Wikidata triples, question
rewriting, question answering from conversation or from knowledge graphs and
quiz generation.
Related papers
- NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents.
To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z) - PCoQA: Persian Conversational Question Answering Dataset [12.07607688189035]
The PCoQA dataset is a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions.
PCoQA is designed to present novel challenges compared to previous question answering datasets.
This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models.
arXiv Detail & Related papers (2023-12-07T15:29:34Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - Discourse Analysis via Questions and Answers: Parsing Dependency
Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis.
We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions.
We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z) - An Answer Verbalization Dataset for Conversational Question Answerings
over Knowledge Graphs [9.979689965471428]
This paper contributes to the state-of-the-art by extending an existing ConvQA dataset with verbalized answers.
We perform experiments with five sequence-to-sequence models on generating answer responses while maintaining grammatical correctness.
arXiv Detail & Related papers (2022-08-13T21:21:28Z) - A Chinese Multi-type Complex Questions Answering Dataset over Wikidata [45.31495982252219]
Complex Knowledge Base Question Answering is a popular area of research in the past decade.
Recent public datasets have led to encouraging results in this field, but are mostly limited to English.
Few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases.
We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges.
arXiv Detail & Related papers (2021-11-11T07:39:16Z) - TopiOCQA: Open-domain Conversational Question Answeringwith Topic
Switching [11.717296856448566]
We introduce TopiOCQA, an open-domain conversational dataset with topic switches on Wikipedia.
TopiOCQA contains 3,920 conversations with information-seeking questions and free-form answers.
We evaluate several baselines, by combining state-of-the-art document retrieval methods with neural reader models.
arXiv Detail & Related papers (2021-10-02T09:53:48Z) - QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels.
In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.