PCoQA: Persian Conversational Question Answering Dataset
- URL: http://arxiv.org/abs/2312.04362v1
- Date: Thu, 7 Dec 2023 15:29:34 GMT
- Title: PCoQA: Persian Conversational Question Answering Dataset
- Authors: Hamed Hematian Hemati, Atousa Toghyani, Atena Souri, Sayed Hesam
Alavian, Hossein Sameti, Hamid Beigy
- Abstract summary: The PCoQA dataset is a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions.
PCoQA is designed to present novel challenges compared to previous question answering datasets.
This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models.
- Score: 12.07607688189035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans seek information regarding a specific topic through performing a
conversation containing a series of questions and answers. In the pursuit of
conversational question answering research, we introduce the PCoQA, the first
\textbf{P}ersian \textbf{Co}nversational \textbf{Q}uestion \textbf{A}nswering
dataset, a resource comprising information-seeking dialogs encompassing a total
of 9,026 contextually-driven questions. Each dialog involves a questioner, a
responder, and a document from the Wikipedia; The questioner asks several
inter-connected questions from the text and the responder provides a span of
the document as the answer for each question. PCoQA is designed to present
novel challenges compared to previous question answering datasets including
having more open-ended non-factual answers, longer answers, and fewer lexical
overlaps. This paper not only presents the comprehensive PCoQA dataset but also
reports the performance of various benchmark models. Our models include
baseline models and pre-trained models, which are leveraged to boost the
performance of the model. The dataset and benchmarks are available at our
Github page.
Related papers
- ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - TopiOCQA: Open-domain Conversational Question Answeringwith Topic
Switching [11.717296856448566]
We introduce TopiOCQA, an open-domain conversational dataset with topic switches on Wikipedia.
TopiOCQA contains 3,920 conversations with information-seeking questions and free-form answers.
We evaluate several baselines, by combining state-of-the-art document retrieval methods with neural reader models.
arXiv Detail & Related papers (2021-10-02T09:53:48Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels.
In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - FeTaQA: Free-form Table Question Answering [33.018256483762386]
We introduce FeTaQA, a new dataset with 10K Wikipedia-based table, question, free-form answer, supporting table cells pairs.
FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source.
arXiv Detail & Related papers (2021-04-01T09:59:40Z) - ParaQA: A Question Answering Dataset with Paraphrase Responses for
Single-Turn Conversation [5.087932295628364]
ParaQA is a dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG)
The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation.
arXiv Detail & Related papers (2021-03-13T18:53:07Z) - IIRC: A Dataset of Incomplete Information Reading Comprehension
Questions [53.3193258414806]
We present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia.
The questions were written by crowd workers who did not have access to any of the linked documents.
We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset.
arXiv Detail & Related papers (2020-11-13T20:59:21Z) - Challenges in Information-Seeking QA: Unanswerable Questions and
Paragraph Retrieval [46.3246135936476]
We analyze why answering information-seeking queries is more challenging and where their prevalent unanswerabilities arise.
Our controlled experiments suggest two headrooms -- paragraph selection and answerability prediction.
We manually annotate 800 unanswerable examples across six languages on what makes them challenging to answer.
arXiv Detail & Related papers (2020-10-22T17:48:17Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.