PQuAD: A Persian Question Answering Dataset
- URL: http://arxiv.org/abs/2202.06219v1
- Date: Sun, 13 Feb 2022 05:42:55 GMT
- Title: PQuAD: A Persian Question Answering Dataset
- Authors: Kasra Darvishi, Newsha Shahbodagh, Zahra Abbasiantaeb, Saeedeh Momtazi
- Abstract summary: crowdsourced reading comprehension dataset on Persian Wikipedia articles.
Includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable.
Our experiments on different state-of-the-art pre-trained contextualized language models show 74.8% Exact Match (EM) and 87.6% F1-score.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Persian Question Answering Dataset (PQuAD), a crowdsourced reading
comprehension dataset on Persian Wikipedia articles. It includes 80,000
questions along with their answers, with 25% of the questions being
adversarially unanswerable. We examine various properties of the dataset to
show the diversity and the level of its difficulty as an MRC benchmark. By
releasing this dataset, we aim to ease research on Persian reading
comprehension and development of Persian question answering systems. Our
experiments on different state-of-the-art pre-trained contextualized language
models show 74.8% Exact Match (EM) and 87.6% F1-score that can be used as the
baseline results for further research on Persian QA.
Related papers
- Localizing and Mitigating Errors in Long-form Question Answering [79.63372684264921]
Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension.
This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers.
arXiv Detail & Related papers (2024-07-16T17:23:16Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Fully Authentic Visual Question Answering Dataset from Online Communities [72.0524198499719]
Visual Question Answering (VQA) entails answering questions about images.
We introduce the first VQA dataset in which all contents originate from an authentic use case.
We characterize this dataset and how it relates to eight mainstream VQA datasets.
arXiv Detail & Related papers (2023-11-27T06:19:00Z) - IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering
in Islamic Text Resources [0.0]
This article introduces the IslamicPCQA dataset for answering complex questions based on non-structured information sources.
The prepared dataset covers a wide range of Islamic topics and aims to facilitate answering complex Persian questions within this subject matter.
arXiv Detail & Related papers (2023-04-23T14:20:58Z) - JaQuAD: Japanese Question Answering Dataset for Machine Reading
Comprehension [0.0]
We present the Japanese Question Answering dataset, JaQuAD, which is annotated by humans.
JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles.
We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set.
arXiv Detail & Related papers (2022-02-03T18:40:25Z) - QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages.
Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z) - PerCQA: Persian Community Question Answering Dataset [2.503043323723241]
Community Question Answering (CQA) forums provide answers for many real-life questions.
We present PerCQA, the first Persian dataset for CQA.
This dataset contains the questions and answers crawled from the most well-known Persian forum.
arXiv Detail & Related papers (2021-12-25T14:06:41Z) - A Knowledge-based Approach for Answering Complex Questions in Persian [0.0]
We propose a knowledge-based approach for answering complex questions in Persian.
We handle multi-constraint and multi-hop questions by building their set of possible corresponding logical forms.
The answer to the question is built from the answer to the logical form, extracted from the knowledge graph.
arXiv Detail & Related papers (2021-07-05T14:01:43Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - IIRC: A Dataset of Incomplete Information Reading Comprehension
Questions [53.3193258414806]
We present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia.
The questions were written by crowd workers who did not have access to any of the linked documents.
We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset.
arXiv Detail & Related papers (2020-11-13T20:59:21Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.