PQuAD: A Persian Question Answering Dataset
- URL: http://arxiv.org/abs/2202.06219v1
- Date: Sun, 13 Feb 2022 05:42:55 GMT
- Title: PQuAD: A Persian Question Answering Dataset
- Authors: Kasra Darvishi, Newsha Shahbodagh, Zahra Abbasiantaeb, Saeedeh Momtazi
- Abstract summary: crowdsourced reading comprehension dataset on Persian Wikipedia articles.
Includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable.
Our experiments on different state-of-the-art pre-trained contextualized language models show 74.8% Exact Match (EM) and 87.6% F1-score.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Persian Question Answering Dataset (PQuAD), a crowdsourced reading
comprehension dataset on Persian Wikipedia articles. It includes 80,000
questions along with their answers, with 25% of the questions being
adversarially unanswerable. We examine various properties of the dataset to
show the diversity and the level of its difficulty as an MRC benchmark. By
releasing this dataset, we aim to ease research on Persian reading
comprehension and development of Persian question answering systems. Our
experiments on different state-of-the-art pre-trained contextualized language
models show 74.8% Exact Match (EM) and 87.6% F1-score that can be used as the
baseline results for further research on Persian QA.
Related papers
- A Method for Multi-Hop Question Answering on Persian Knowledge Graph [0.0]
Major challenges persist in answering multi-hop complex questions, particularly in Persian.
One of the main challenges is the accurate understanding and transformation of these multi-hop complex questions into semantically equivalent SPARQL queries.
In this study, a dataset of 5,600 Persian multi-hop complex questions was developed, along with their forms based on the semantic representation of the questions.
An architecture was proposed for answering complex questions using a Persian knowledge graph.
arXiv Detail & Related papers (2025-01-18T18:11:29Z) - Building a Rich Dataset to Empower the Persian Question Answering Systems [0.6138671548064356]
This dataset is called NextQuAD and has 7,515 contexts, including 23,918 questions and answers.
BERT-based question answering model has been applied to this dataset using two pre-trained language models.
Evaluation on the development set shows 0.95 Exact Match (EM) and 0.97 Fl_score.
arXiv Detail & Related papers (2024-12-28T16:53:25Z) - Localizing and Mitigating Errors in Long-form Question Answering [79.63372684264921]
Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension.
This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers.
arXiv Detail & Related papers (2024-07-16T17:23:16Z) - Fully Authentic Visual Question Answering Dataset from Online Communities [72.0524198499719]
Visual Question Answering (VQA) entails answering questions about images.
We introduce the first VQA dataset in which all contents originate from an authentic use case.
We characterize this dataset and how it relates to eight mainstream VQA datasets.
arXiv Detail & Related papers (2023-11-27T06:19:00Z) - ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality.
We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions.
The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z) - IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering
in Islamic Text Resources [0.0]
This article introduces the IslamicPCQA dataset for answering complex questions based on non-structured information sources.
The prepared dataset covers a wide range of Islamic topics and aims to facilitate answering complex Persian questions within this subject matter.
arXiv Detail & Related papers (2023-04-23T14:20:58Z) - JaQuAD: Japanese Question Answering Dataset for Machine Reading
Comprehension [0.0]
We present the Japanese Question Answering dataset, JaQuAD, which is annotated by humans.
JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles.
We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set.
arXiv Detail & Related papers (2022-02-03T18:40:25Z) - QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages.
Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - IIRC: A Dataset of Incomplete Information Reading Comprehension
Questions [53.3193258414806]
We present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia.
The questions were written by crowd workers who did not have access to any of the linked documents.
We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset.
arXiv Detail & Related papers (2020-11-13T20:59:21Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.