IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering
in Islamic Text Resources
- URL: http://arxiv.org/abs/2304.11664v1
- Date: Sun, 23 Apr 2023 14:20:58 GMT
- Title: IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering
in Islamic Text Resources
- Authors: Arash Ghafouri, Hasan Naderi, Mohammad Aghajani asl and Mahdi
Firouzmandi
- Abstract summary: This article introduces the IslamicPCQA dataset for answering complex questions based on non-structured information sources.
The prepared dataset covers a wide range of Islamic topics and aims to facilitate answering complex Persian questions within this subject matter.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Nowadays, one of the main challenges for Question Answering Systems is to
answer complex questions using various sources of information. Multi-hop
questions are a type of complex questions that require multi-step reasoning to
answer. In this article, the IslamicPCQA dataset is introduced. This is the
first Persian dataset for answering complex questions based on non-structured
information sources and consists of 12,282 question-answer pairs extracted from
9 Islamic encyclopedias. This dataset has been created inspired by the HotpotQA
English dataset approach, which was customized to suit the complexities of the
Persian language. Answering questions in this dataset requires more than one
paragraph and reasoning. The questions are not limited to any prior knowledge
base or ontology, and to provide robust reasoning ability, the dataset also
includes supporting facts and key sentences. The prepared dataset covers a wide
range of Islamic topics and aims to facilitate answering complex Persian
questions within this subject matter
Related papers
- PCoQA: Persian Conversational Question Answering Dataset [12.07607688189035]
The PCoQA dataset is a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions.
PCoQA is designed to present novel challenges compared to previous question answering datasets.
This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models.
arXiv Detail & Related papers (2023-12-07T15:29:34Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - PQuAD: A Persian Question Answering Dataset [0.0]
crowdsourced reading comprehension dataset on Persian Wikipedia articles.
Includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable.
Our experiments on different state-of-the-art pre-trained contextualized language models show 74.8% Exact Match (EM) and 87.6% F1-score.
arXiv Detail & Related papers (2022-02-13T05:42:55Z) - Discourse Comprehension: A Question Answering Framework to Represent
Sentence Connections [35.005593397252746]
A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data.
This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents.
The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
arXiv Detail & Related papers (2021-11-01T04:50:26Z) - ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - A Knowledge-based Approach for Answering Complex Questions in Persian [0.0]
We propose a knowledge-based approach for answering complex questions in Persian.
We handle multi-constraint and multi-hop questions by building their set of possible corresponding logical forms.
The answer to the question is built from the answer to the logical form, extracted from the knowledge graph.
arXiv Detail & Related papers (2021-07-05T14:01:43Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - MultiModalQA: Complex Question Answering over Text, Tables and Images [52.25399438133274]
We present MultiModalQA: a dataset that requires joint reasoning over text, tables and images.
We create MMQA using a new framework for generating complex multi-modal questions at scale.
We then define a formal language that allows us to take questions that can be answered from a single modality, and combine them to generate cross-modal questions.
arXiv Detail & Related papers (2021-04-13T09:14:28Z) - IIRC: A Dataset of Incomplete Information Reading Comprehension
Questions [53.3193258414806]
We present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia.
The questions were written by crowd workers who did not have access to any of the linked documents.
We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset.
arXiv Detail & Related papers (2020-11-13T20:59:21Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.