Related papers: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering

ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering

URL: http://arxiv.org/abs/2406.04866v2
Date: Mon, 07 Oct 2024 12:32:24 GMT
Title: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering
Authors: Raphael Gruber, Abdelrahman Abdallah, Michael Färber, Adam Jatowt,
Abstract summary: ComplexTempQA is a large-scale dataset consisting of over 100 million question-answer pairs. The dataset covers questions spanning over two decades and offers an unmatched breadth of topics.
Score: 24.046966640011124
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce ComplexTempQA, a large-scale dataset consisting of over 100 million question-answer pairs designed to tackle the challenges in temporal question answering. ComplexTempQA significantly surpasses existing benchmarks like HOTPOTQA, TORQUE, and TEQUILA in scale and scope. Utilizing data from Wikipedia and Wikidata, the dataset covers questions spanning over two decades and offers an unmatched breadth of topics. We introduce a unique taxonomy that categorizes questions as attributes, comparisons, and counting questions, each revolving around events, entities, and time periods. One standout feature of ComplexTempQA is the high complexity of its questions, which demand effective capabilities for answering such as across-time comparison, temporal aggregation, and multi-hop reasoning involving temporal event ordering and entity recognition. Additionally, each question is accompanied by detailed metadata, including specific time scopes, allowing for comprehensive evaluation and enhancement of the temporal reasoning abilities of large language models. ComplexTempQA serves both as a testing ground for developing sophisticated AI models and as a foundation for advancing research in question answering, information retrieval, and language understanding.

Related papers

A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation [40.00268164578221]
ChronoQA is a large-scale benchmark dataset for Chinese question answering.<n>It contains 5,176 high-quality questions covering absolute, aggregate, and relative temporal types with both explicit and implicit time expressions.
arXiv Detail & Related papers (2025-08-17T08:12:59Z)
The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z)
Evaluating List Construction and Temporal Understanding capabilities of Large Language Models [54.39278049092508]
Large Language Models (LLMs) are susceptible to hallucinations and errors on particularly temporal understanding tasks.<n>We propose the Time referenced List based Question Answering (TLQA) benchmark that requires structured answers in list format aligned with corresponding time periods.<n>We investigate the temporal understanding and list construction capabilities of state-of-the-art generative models on TLQA in closed-book and open-domain settings.
arXiv Detail & Related papers (2025-06-26T21:40:58Z)
It's High Time: A Survey of Temporal Question Answering [17.07150094603319]
Temporal Question Answering (TQA) focuses on answering questions involving temporal constraints or context.<n>Recent advances in TQA enabled by neural models and Large Language Models (LLMs)<n> benchmark datasets and evaluation strategies designed to test temporal robustness, recency awareness, and generalization.
arXiv Detail & Related papers (2025-05-26T17:21:26Z)
Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement [55.2439260314328]
Time Series Multi-Task Question Answering (Time-MQA) is a unified framework that enables natural language queries across multiple time series tasks. Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $sim $200k question-answer pairs.
arXiv Detail & Related papers (2025-02-26T13:47:13Z)
TimeLogic: A Temporal Logic Benchmark for Video QA [64.32208175236323]
We introduce the TimeLogic QA (TLQA) framework to automatically generate temporal logical questions. We leverage 4 datasets, STAR, Breakfast, AGQA, and CrossTask, and generate 2k and 10k QA pairs for each category. We assess the VideoQA model's temporal reasoning performance on 16 categories of temporal logic with varying temporal complexity.
arXiv Detail & Related papers (2025-01-13T11:12:59Z)
Multi-hop Question Answering under Temporal Knowledge Editing [9.356343796845662]
Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. Existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. We propose TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE-MQA) to address this limitation.
arXiv Detail & Related papers (2024-03-30T23:22:51Z)
Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities. We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z)
Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires. We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents. Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z)
Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph [34.44840297353777]
Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. We propose textbfunderlineJoint textbfunderlineMulti textbfunderlineFacts textbfunderlineReasoning textbfunderlineNetwork (JMFRN)
arXiv Detail & Related papers (2024-01-04T11:34:39Z)
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning [73.51314109184197]
It is crucial for large language models (LLMs) to understand the concept of temporal knowledge. We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
arXiv Detail & Related papers (2023-11-16T11:49:29Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases [67.33560134350427]
TempQA-WD is a benchmark dataset for temporal reasoning. It is based on Wikidata, which is the most frequently curated, openly available knowledge base.
arXiv Detail & Related papers (2022-01-15T08:49:09Z)
TempoQR: Temporal Question Reasoning over Knowledge Graphs [11.054877399064804]
This paper puts forth a comprehensive embedding-based framework for answering complex questions over Knowledge Graphs. Our method termed temporal question reasoning (TempoQR) exploits TKG embeddings to ground the question to the specific entities and time scope it refers to. Experiments show that TempoQR improves accuracy by 25--45 percentage points on complex temporal questions over state-of-the-art approaches.
arXiv Detail & Related papers (2021-12-10T23:59:14Z)
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers. We call this dataset ConditionalQA. We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z)
Complex Temporal Question Answering on Knowledge Graphs [22.996399822102575]
This work presents EXAQT, the first end-to-end system for answering complex temporal questions. It answers natural language questions over knowledge graphs (KGs) in two stages, one geared towards high recall, the other towards precision at top ranks. We evaluate EXAQT on TimeQuestions, a large dataset of 16k temporal questions compiled from a variety of general purpose KG-QA benchmarks.
arXiv Detail & Related papers (2021-09-18T13:41:43Z)
QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.