Related papers: PQuAD: A Persian Question Answering Dataset

PQuAD: A Persian Question Answering Dataset

URL: http://arxiv.org/abs/2202.06219v1
Date: Sun, 13 Feb 2022 05:42:55 GMT
Title: PQuAD: A Persian Question Answering Dataset
Authors: Kasra Darvishi, Newsha Shahbodagh, Zahra Abbasiantaeb, Saeedeh Momtazi
Abstract summary: crowdsourced reading comprehension dataset on Persian Wikipedia articles. Includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable. Our experiments on different state-of-the-art pre-trained contextualized language models show 74.8% Exact Match (EM) and 87.6% F1-score.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Persian Question Answering Dataset (PQuAD), a crowdsourced reading comprehension dataset on Persian Wikipedia articles. It includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable. We examine various properties of the dataset to show the diversity and the level of its difficulty as an MRC benchmark. By releasing this dataset, we aim to ease research on Persian reading comprehension and development of Persian question answering systems. Our experiments on different state-of-the-art pre-trained contextualized language models show 74.8% Exact Match (EM) and 87.6% F1-score that can be used as the baseline results for further research on Persian QA.

Related papers

A Method for Multi-Hop Question Answering on Persian Knowledge Graph [0.0]
Major challenges persist in answering multi-hop complex questions, particularly in Persian. One of the main challenges is the accurate understanding and transformation of these multi-hop complex questions into semantically equivalent SPARQL queries. In this study, a dataset of 5,600 Persian multi-hop complex questions was developed, along with their forms based on the semantic representation of the questions. An architecture was proposed for answering complex questions using a Persian knowledge graph.
arXiv Detail & Related papers (2025-01-18T18:11:29Z)
Building a Rich Dataset to Empower the Persian Question Answering Systems [0.6138671548064356]
This dataset is called NextQuAD and has 7,515 contexts, including 23,918 questions and answers. BERT-based question answering model has been applied to this dataset using two pre-trained language models. Evaluation on the development set shows 0.95 Exact Match (EM) and 0.97 Fl_score.
arXiv Detail & Related papers (2024-12-28T16:53:25Z)
Localizing and Mitigating Errors in Long-form Question Answering [79.63372684264921]
Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension. This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers.
arXiv Detail & Related papers (2024-07-16T17:23:16Z)
Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA) We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA). Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z)
Fully Authentic Visual Question Answering Dataset from Online Communities [72.0524198499719]
Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. We characterize this dataset and how it relates to eight mainstream VQA datasets.
arXiv Detail & Related papers (2023-11-27T06:19:00Z)
ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality. We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions. The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z)
IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering in Islamic Text Resources [0.0]
This article introduces the IslamicPCQA dataset for answering complex questions based on non-structured information sources. The prepared dataset covers a wide range of Islamic topics and aims to facilitate answering complex Persian questions within this subject matter.
arXiv Detail & Related papers (2023-04-23T14:20:58Z)
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension [0.0]
We present the Japanese Question Answering dataset, JaQuAD, which is annotated by humans. JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles. We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set.
arXiv Detail & Related papers (2022-02-03T18:40:25Z)
QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z)
PerCQA: Persian Community Question Answering Dataset [2.503043323723241]
Community Question Answering (CQA) forums provide answers for many real-life questions. We present PerCQA, the first Persian dataset for CQA. This dataset contains the questions and answers crawled from the most well-known Persian forum.
arXiv Detail & Related papers (2021-12-25T14:06:41Z)
A Knowledge-based Approach for Answering Complex Questions in Persian [0.0]
We propose a knowledge-based approach for answering complex questions in Persian. We handle multi-constraint and multi-hop questions by building their set of possible corresponding logical forms. The answer to the question is built from the answer to the logical form, extracted from the knowledge graph.
arXiv Detail & Related papers (2021-07-05T14:01:43Z)
PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering. This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase. There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z)
IIRC: A Dataset of Incomplete Information Reading Comprehension Questions [53.3193258414806]
We present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia. The questions were written by crowd workers who did not have access to any of the linked documents. We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset.
arXiv Detail & Related papers (2020-11-13T20:59:21Z)
Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document. We show that readers engage in a series of pragmatic strategies to seek information. We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.