TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions
- URL: http://arxiv.org/abs/2005.00242v2
- Date: Tue, 6 Oct 2020 03:57:19 GMT
- Title: TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions
- Authors: Qiang Ning, Hao Wu, Rujun Han, Nanyun Peng, Matt Gardner, Dan Roth
- Abstract summary: We introduce TORQUE, a new English reading comprehension benchmark built on 3.2k news with 21k human-generated questions querying temporal relationships.
Results show that RoBERTa-large snippets achieves an exact-match score of 51% on the test set of TORQUE, about 30% behind human performance.
- Score: 91.85730323228833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A critical part of reading is being able to understand the temporal
relationships between events described in a passage of text, even when those
relationships are not explicitly stated. However, current machine reading
comprehension benchmarks have practically no questions that test temporal
phenomena, so systems trained on these benchmarks have no capacity to answer
questions such as "what happened before/after [some event]?" We introduce
TORQUE, a new English reading comprehension benchmark built on 3.2k news
snippets with 21k human-generated questions querying temporal relationships.
Results show that RoBERTa-large achieves an exact-match score of 51% on the
test set of TORQUE, about 30% behind human performance.
Related papers
- RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries.
This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - On the Role of Context in Reading Time Prediction [50.87306355705826]
We present a new perspective on how readers integrate context during real-time language comprehension.
Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit is an affine function of its in-context information content.
arXiv Detail & Related papers (2024-09-12T15:52:22Z) - Question Generation for Reading Comprehension Assessment by Modeling How
and What to Ask [3.470121495099]
We study Question Generation (QG) for reading comprehension where inferential questions are critical.
We propose a two-step model (HTA-WTA) that takes advantage of previous datasets.
We show that the HTA-WTA model tests for strong SCRS by asking deep inferential questions.
arXiv Detail & Related papers (2022-04-06T15:52:24Z) - QuALITY: Question Answering with Long Input Texts, Yes! [27.700792723226524]
We introduce QuALITY, a dataset with context passages in English that have an average length of about 5,000 tokens.
Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage.
Only half of the questions are answerable by annotators working under tight time constraints.
arXiv Detail & Related papers (2021-12-16T04:14:38Z) - What Makes Sentences Semantically Related: A Textual Relatedness Dataset
and Empirical Study [31.062129406113588]
We introduce a dataset for Semantic Textual Relatedness, STR-2022, that has 5,500 English sentence pairs manually annotated.
We show that human intuition regarding relatedness of sentence pairs is highly reliable, with a repeat annotation correlation of 0.84.
We also show the utility of STR-2022 for evaluating automatic methods of sentence representation and for various downstream NLP tasks.
arXiv Detail & Related papers (2021-10-10T16:23:54Z) - ESTER: A Machine Reading Comprehension Dataset for Event Semantic
Relation Reasoning [49.795767003586235]
We introduce ESTER, a comprehensive machine reading comprehension dataset for Event Semantic Relation Reasoning.
We study five most commonly used event semantic relations and formulate them as question answering tasks.
Experimental results show that the current SOTA systems achieve 60.5%, 57.8%, and 76.3% for event-based F1, token based F1 and HIT@1 scores respectively.
arXiv Detail & Related papers (2021-04-16T19:59:26Z) - Temporal Reasoning on Implicit Events from Distant Supervision [91.20159064951487]
We propose a novel temporal reasoning dataset that evaluates the degree to which systems understand implicit events.
We find that state-of-the-art models struggle when predicting temporal relationships between implicit and explicit events.
We propose a neuro-symbolic temporal reasoning model, SYMTIME, which exploits distant supervision signals from large-scale text and uses temporal rules to infer end times.
arXiv Detail & Related papers (2020-10-24T03:12:27Z) - Temporal Common Sense Acquisition with Minimal Supervision [77.8308414884754]
This work proposes a novel sequence modeling approach that exploits explicit and implicit mentions of temporal common sense.
Our method is shown to give quality predictions of various dimensions of temporal common sense.
It also produces representations of events for relevant tasks such as duration comparison, parent-child relations, event coreference and temporal QA.
arXiv Detail & Related papers (2020-05-08T22:20:16Z) - STARC: Structured Annotations for Reading Comprehension [23.153841344989143]
We present STARC, a new annotation framework for assessing reading comprehension with multiple choice questions.
The framework is implemented in OneStopQA, a new high-quality dataset for evaluation and analysis of reading comprehension in English.
arXiv Detail & Related papers (2020-04-30T14:08:50Z) - Conversational Question Answering over Passages by Leveraging Word
Proximity Networks [33.59664244897881]
CROWN is an unsupervised yet effective system for conversational QA with passage responses.
It supports several modes of context propagation over multiple turns.
CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.
arXiv Detail & Related papers (2020-04-27T19:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.