Benchmarks for Pir\'a 2.0, a Reading Comprehension Dataset about the
Ocean, the Brazilian Coast, and Climate Change
- URL: http://arxiv.org/abs/2309.10945v1
- Date: Tue, 19 Sep 2023 21:56:45 GMT
- Title: Benchmarks for Pir\'a 2.0, a Reading Comprehension Dataset about the
Ocean, the Brazilian Coast, and Climate Change
- Authors: Paulo Pirozelli, Marcos M. Jos\'e, Igor Silveira, Fl\'avio Nakasato,
Sarajane M. Peres, Anarosa A. F. Brand\~ao, Anna H. R. Costa, Fabio G. Cozman
- Abstract summary: Pir'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change.
This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge.
- Score: 0.24091079613649843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pir\'a is a reading comprehension dataset focused on the ocean, the Brazilian
coast, and climate change, built from a collection of scientific abstracts and
reports on these topics. This dataset represents a versatile language resource,
particularly useful for testing the ability of current machine learning models
to acquire expert scientific knowledge. Despite its potential, a detailed set
of baselines has not yet been developed for Pir\'a. By creating these
baselines, researchers can more easily utilize Pir\'a as a resource for testing
machine learning models across a wide range of question answering tasks. In
this paper, we define six benchmarks over the Pir\'a dataset, covering closed
generative question answering, machine reading comprehension, information
retrieval, open question answering, answer triggering, and multiple choice
question answering. As part of this effort, we have also produced a curated
version of the original dataset, where we fixed a number of grammar issues,
repetitions, and other shortcomings. Furthermore, the dataset has been extended
in several new directions, so as to face the aforementioned benchmarks:
translation of supporting texts from English into Portuguese, classification
labels for answerability, automatic paraphrases of questions and answers, and
multiple choice candidates. The results described in this paper provide several
points of reference for researchers interested in exploring the challenges
provided by the Pir\'a dataset.
Related papers
- Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs [55.80189506270598]
X-PARADE is the first cross-lingual dataset of paragraph-level information divergences.
Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language.
Aligned paragraphs are sourced from Wikipedia pages in different languages.
arXiv Detail & Related papers (2023-09-16T04:34:55Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain [20.801638768447948]
This dataset contains 3,397 samples of multiple choice questions, answers (including distractors), and their source documents from the educational domain.
Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations.
All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards.
arXiv Detail & Related papers (2022-10-12T11:28:34Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Pir\'a: A Bilingual Portuguese-English Dataset for Question-Answering
about the Ocean [1.1837802026343334]
This paper presents the Pir'a dataset, a large set of questions and answers about the ocean and the Brazilian coast both in Portuguese and English.
The Pir'a dataset consists of 2261 properly curated question/answer (QA) sets in both languages.
We discuss some of the advantages as well as limitations of Pir'a, as this new resource can support a set of tasks in NLP such as question-answering, information retrieval, and machine translation.
arXiv Detail & Related papers (2022-02-04T21:29:45Z) - A Survey on non-English Question Answering Dataset [0.0]
The aim of this survey is to recognize, summarize and analyze the existing datasets that have been released by many researchers.
In this paper, we review question answering datasets that are available in common languages other than English such as French, German, Japanese, Chinese, Arabic, Russian, as well as the multilingual and cross-lingual question-answering datasets.
arXiv Detail & Related papers (2021-12-27T12:45:06Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - English Machine Reading Comprehension Datasets: A Survey [13.767812547998735]
We categorize the datasets according to their question and answer form and compare them across various dimensions including size, vocabulary, data source, method of creation, human performance level, and first question word.
Our analysis reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.
arXiv Detail & Related papers (2021-01-25T21:15:06Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.