ARCOQ: Arabic Closest Opposite Questions Dataset
- URL: http://arxiv.org/abs/2310.14384v1
- Date: Sun, 22 Oct 2023 18:41:26 GMT
- Title: ARCOQ: Arabic Closest Opposite Questions Dataset
- Authors: Sandra Rizkallah, Amir F. Atiya, and Samir Shaheen
- Abstract summary: This paper presents a dataset for closest opposite questions in Arabic language.
The structure is similar to that of the Graduate Record Examination (GRE) closest opposite questions dataset for the English language.
The paper provides a benchmark for the performance of different Arabic word embedding models on the introduced dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a dataset for closest opposite questions in Arabic
language. The dataset is the first of its kind for the Arabic language. It is
beneficial for the assessment of systems on the aspect of antonymy detection.
The structure is similar to that of the Graduate Record Examination (GRE)
closest opposite questions dataset for the English language. The introduced
dataset consists of 500 questions, each contains a query word for which the
closest opposite needs to be determined from among a set of candidate words.
Each question is also associated with the correct answer. We publish the
dataset publicly in addition to providing standard splits of the dataset into
development and test sets. Moreover, the paper provides a benchmark for the
performance of different Arabic word embedding models on the introduced
dataset.
Related papers
- Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - ArabicaQA: A Comprehensive Dataset for Arabic Question Answering [13.65056111661002]
We introduce ArabicaQA, the first large-scale dataset for machine reading comprehension and open-domain question answering in Arabic.
We also present AraDPR, the first dense passage retrieval model trained on the Arabic Wikipedia corpus.
arXiv Detail & Related papers (2024-03-26T16:37:54Z) - Benchmarks for Pir\'a 2.0, a Reading Comprehension Dataset about the
Ocean, the Brazilian Coast, and Climate Change [0.24091079613649843]
Pir'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change.
This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge.
arXiv Detail & Related papers (2023-09-19T21:56:45Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - A Benchmark Arabic Dataset for Commonsense Explanation [0.6091702876917281]
This paper presents a benchmark Arabic dataset for commonsense explanation.
The dataset consists of Arabic sentences that does not make sense along with three choices to select among them the one that explains why the sentence is false.
arXiv Detail & Related papers (2020-12-18T14:07:10Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.