TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
- URL: http://arxiv.org/abs/2511.01101v1
- Date: Sun, 02 Nov 2025 22:33:19 GMT
- Title: TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
- Authors: Marek Strong, Andreas Vlachos,
- Abstract summary: We introduce TSVer, a new benchmark dataset for fact verification focusing on temporal and numerical reasoning with time-series evidence.<n> TSVer contains 287 real-world claims sourced from 38 fact-checking organizations and a curated database of 400 time series covering diverse domains.
- Score: 8.095827820420839
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reasoning over temporal and numerical data, such as time series, is a crucial aspect of fact-checking. While many systems have recently been developed to handle this form of evidence, their evaluation remains limited by existing datasets, which often lack structured evidence, provide insufficient justifications for verdicts, or rely on synthetic claims. In this paper, we introduce TSVer, a new benchmark dataset for fact verification focusing on temporal and numerical reasoning with time-series evidence. TSVer contains 287 real-world claims sourced from 38 fact-checking organizations and a curated database of 400 time series covering diverse domains. Each claim is annotated with time frames across all pertinent time series, along with a verdict and justifications reflecting how the evidence is used to reach the verdict. Using an LLM-assisted multi-step annotation process, we improve the quality of our annotations and achieve an inter-annotator agreement of kappa=0.745 on verdicts. We also develop a baseline for verifying claims against time-series evidence and show that even the state-of-the-art reasoning models like Gemini-2.5-Pro are challenged by time series, achieving a 63.37 accuracy score on verdicts and an Ev2R score of 48.63 on verdict justifications.
Related papers
- A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition [7.910984819642885]
QuanTemp++ is a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim.<n>We characterize the retrieval performance of key claim decomposition paradigms.
arXiv Detail & Related papers (2025-10-24T22:37:13Z) - Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback [55.284574165467525]
Time-series Reasoning for Anomaly (Time-RA) transforms classical time series anomaly detection into a generative, reasoning-intensive task.<n>Also, we introduce the first real-world multimodal benchmark dataset, RATs40K, explicitly annotated for anomaly reasoning.
arXiv Detail & Related papers (2025-07-20T18:02:50Z) - ChronoFact: Timeline-based Temporal Fact Verification [15.698391632115856]
Temporal claims, often riddled with inaccuracies, are a significant challenge in the digital misinformation landscape.<n>We introduce a novel timeline-based fact verification framework that identifies events from both claim and evidence.<n>We also introduce a new dataset of complex temporal claims involving timeline-based reasoning.
arXiv Detail & Related papers (2024-10-19T03:44:19Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Evidence-Based Temporal Fact Verification [15.698391632115856]
We propose an end-to-end solution for temporal fact verification that considers the temporal information in claims to obtain relevant evidence sentences.
We learn time-sensitive representations that encapsulate not only the semantic relationships among the events, but also their chronological proximity.
Experiment results demonstrate that the proposed approach significantly enhances the accuracy of temporal claim verification.
arXiv Detail & Related papers (2024-07-21T23:13:05Z) - AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from
the Web [20.576644330553744]
We introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations.
Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict.
arXiv Detail & Related papers (2023-05-22T15:17:18Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Implicit Temporal Reasoning for Evidence-Based Fact-Checking [14.015789447347466]
Our study demonstrates that time positively influences the claim verification process of evidence-based fact-checking.
Our findings show that the presence of temporal information and the manner in which timelines are constructed greatly influence how fact-checking models determine the relevance and supporting or refuting character of evidence documents.
arXiv Detail & Related papers (2023-02-24T10:48:03Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z) - Time-Aware Evidence Ranking for Fact-Checking [56.247512670779045]
We investigate the hypothesis that the timestamp of a Web page is crucial to how it should be ranked for a given claim.
Our study reveals that time-aware evidence ranking not only surpasses relevance assumptions based purely on semantic similarity or position in a search results list, but also improves veracity predictions of time-sensitive claims in particular.
arXiv Detail & Related papers (2020-09-10T13:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.