RIGOURATE: Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation
- URL: http://arxiv.org/abs/2601.04350v2
- Date: Mon, 12 Jan 2026 01:07:13 GMT
- Title: RIGOURATE: Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation
- Authors: Joseph James, Chenghao Xiao, Yucheng Li, Nafise Sadat Moosavi, Chenghua Lin,
- Abstract summary: RIGOURATE retrieves supporting evidence from a paper's body and assigns each claim an overstatement score.<n>The framework consists of a dataset of over 10K claim-evidence sets from ICLR and NeurIPS papers.
- Score: 29.44948404858214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific rigour tends to be sidelined in favour of bold statements, leading authors to overstate claims beyond what their results support. We present RIGOURATE, a two-stage multimodal framework that retrieves supporting evidence from a paper's body and assigns each claim an overstatement score. The framework consists of a dataset of over 10K claim-evidence sets from ICLR and NeurIPS papers, annotated using eight LLMs, with overstatement scores calibrated using peer-review comments and validated through human evaluation. It employes a fine-tuned reranker for evidence retrieval and a fine-tuned model to predict overstatement scores with justification. Compared to strong baselines, RIGOURATE enables improved evidence retrieval and overstatement detection. Overall, our work operationalises evidential proportionality and supports clearer, more transparent scientific communication.
Related papers
- CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z) - The Alignment Bottleneck in Decomposition-Based Claim Verification [17.197804072440665]
We introduce a new dataset of real-world complex claims featuring temporally bounded evidence and human-annotated sub-claim evidence spans.<n>We evaluate decomposition under two evidence alignment setups: Sub-claim Aligned Evidence (SAE) and Repeated Claim-level Evidence (SRE)<n>Our results reveal that decomposition brings significant performance improvement only when evidence is granular and strictly aligned.
arXiv Detail & Related papers (2026-02-11T00:02:16Z) - Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking [32.6738019397553]
We propose a Retrieve-Refine-Calibrate (RRC) framework based on large language models (LLMs)<n>Specifically, the framework first identifies the entities mentioned in the claim and retrieves evidence relevant to them.<n>Then, it refines the retrieved evidence based on the claim to reduce irrelevant information.<n>Finally, it calibrates the verification process by re-evaluating low-confidence predictions.
arXiv Detail & Related papers (2026-01-23T08:48:52Z) - OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment [63.662126457336534]
OpenNovelty is an agentic system for transparent, evidence-based novelty analysis.<n>It grounds all assessments in retrieved real papers, ensuring verifiable judgments.<n>OpenNovelty aims to empower the research community with a scalable tool that promotes fair, consistent, and evidence-backed peer review.
arXiv Detail & Related papers (2026-01-04T15:48:51Z) - Look As You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning [55.232400251303794]
Look As You Think (LAT) is a reinforcement learning framework that trains models to produce verifiable reasoning paths with consistent attribution.<n>LAT consistently improves the vanilla model in both single- and multi-image settings, yielding average gains of 8.23% in soft exact match (EM) and 47.0% in IoU@0.5.
arXiv Detail & Related papers (2025-11-15T02:50:23Z) - Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning [2.105564340986074]
OMRC-MR is a hierarchical framework that integrates QA-style OMRC summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation.<n> Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-05T09:55:12Z) - MuSciClaims: Multimodal Scientific Claim Verification [13.598508835610474]
We introduce a new benchmark MuSciClaims accompanied by diagnostics tasks.<n>We automatically extract supported claims from scientific articles, which we manually perturb to produce contradicted claims.<n>Our results show most vision-language models are poor (0.3-0.5 F1), with even the best model only achieving 0.72 F1.
arXiv Detail & Related papers (2025-06-05T02:59:51Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection [17.107961913114778]
We introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant.
RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%.
Our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3%.
arXiv Detail & Related papers (2023-11-16T14:43:45Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification [118.03466985807331]
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.<n>We introduce the feedback-based evidence retriever(FER) that optimize the evidence retrieval process by incorporating feedback from the claim verifier.
arXiv Detail & Related papers (2023-10-18T02:59:38Z) - SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim
Verification on Scientific Tables [68.76415918462418]
We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims.
Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models.
Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning.
arXiv Detail & Related papers (2023-05-22T16:13:50Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.