THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?
- URL: http://arxiv.org/abs/2506.21763v2
- Date: Mon, 21 Jul 2025 06:49:51 GMT
- Title: THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?
- Authors: Xin Wang, Jiyao Liu, Yulong Xiao, Junzhi Ning, Lihao Liu, Junjun He, Botian Shi, Kaicheng Yu,
- Abstract summary: We introduce textbfTHE-Tree (textbfTechnology textbfHistory textbfEvolution Tree), a computational framework that constructs such domain-specific evolution trees from scientific literature.
- Score: 16.91455372359864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are accelerating scientific idea generation, but rigorously evaluating these numerous, often superficial, AI-generated propositions for novelty and factual accuracy is a critical bottleneck; manual verification is too slow. Existing validation methods are inadequate: LLMs as standalone verifiers may hallucinate and lack domain knowledge (our findings show 60% unawareness of relevant papers in specific domains), while traditional citation networks lack explicit causality and narrative surveys are unstructured. This underscores a core challenge: the absence of structured, verifiable, and causally-linked historical data of scientific evolution.To address this,we introduce \textbf{THE-Tree} (\textbf{T}echnology \textbf{H}istory \textbf{E}volution Tree), a computational framework that constructs such domain-specific evolution trees from scientific literature. THE-Tree employs a search algorithm to explore evolutionary paths. During its node expansion, it utilizes a novel "Think-Verbalize-Cite-Verify" process: an LLM proposes potential advancements and cites supporting literature. Critically, each proposed evolutionary link is then validated for logical coherence and evidential support by a recovered natural language inference mechanism that interrogates the cited literature, ensuring that each step is grounded. We construct and validate 88 THE-Trees across diverse domains and release a benchmark dataset including up to 71k fact verifications covering 27k papers to foster further research. Experiments demonstrate that i) in graph completion, our THE-Tree improves hit@1 by 8% to 14% across multiple models compared to traditional citation networks; ii) for predicting future scientific developments, it improves hit@1 metric by nearly 10%; and iii) when combined with other methods, it boosts the performance of evaluating important scientific papers by almost 100%.
Related papers
- CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z) - FlyAOC: Evaluating Agentic Ontology Curation of Drosophila Scientific Knowledge Bases [10.00386797940562]
We present FlyBench to evaluate AI agents on end-to-end agentic curation from scientific literature.<n>Given only a gene symbol, agents must search and read from a corpus of 16,898 full-text papers to produce structured annotations.<n>The benchmark includes 7,397 expert-curated annotations across 100 genes drawn from FlyBase.
arXiv Detail & Related papers (2026-02-09T20:12:38Z) - DeepEra: A Deep Evidence Reranking Agent for Scientific Retrieval-Augmented Generated Question Answering [28.427433335623217]
We propose a Deep Evidence Reranking Agent (DeepEra) that integrates step-by-step reasoning.<n>This work is the first to comprehensively study and empirically validate innegligible SSLI issues in two-stage RAG frameworks.
arXiv Detail & Related papers (2026-01-23T06:19:08Z) - SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature [52.36039386997026]
We introduce SciRAG, an open-source framework for scientific literature exploration.<n>We introduce three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution.
arXiv Detail & Related papers (2025-11-18T11:09:19Z) - SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines [112.78540935201558]
We present a scientific reasoning foundation model that aligns natural language with heterogeneous scientific representations.<n>The model is pretrained on a 206B-token corpus spanning scientific text, pure sequences, and sequence-text pairs, then aligned via SFT on 40M instructions.<n>It supports four capability families, covering up to 103 tasks across: (i) faithful translation between text and scientific formats, (ii) text/knowledge extraction, (iii) property prediction, (iv) property classification, (v) unconditional and conditional sequence generation and design.
arXiv Detail & Related papers (2025-09-25T17:52:06Z) - Mapping the Evolution of Research Contributions using KnoVo [0.0]
KnoVo is an intelligent framework designed for quantifying and analyzing the evolution of research novelty in the scientific literature.<n>It determines a paper's novelty relative to both prior and subsequent work within its multilayered citation network.
arXiv Detail & Related papers (2025-06-20T23:17:11Z) - Bayesian Epistemology with Weighted Authority: A Formal Architecture for Truth-Promoting Autonomous Scientific Reasoning [0.0]
This paper introduces Bayesian Epistemology with Weighted Authority (BEWA)<n>BEWA operationalises belief as a dynamic, probabilistically coherent function over structured scientific claims.<n>It supports graph-based claim propagation, authorial credibility modelling, cryptographic anchoring, and zero-knowledge audit verification.
arXiv Detail & Related papers (2025-06-19T04:22:35Z) - Atomic Reasoning for Scientific Table Claim Verification [83.14588611859826]
Non-experts are susceptible to misleading claims based on scientific tables due to their high information density and perceived credibility.<n>Existing table claim verification models, including state-of-the-art large language models (LLMs), often struggle with precise fine-grained reasoning.<n>Inspired by Cognitive Load Theory, we propose that enhancing a model's ability to interpret table-based claims involves reducing cognitive load.
arXiv Detail & Related papers (2025-06-08T02:46:22Z) - XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration [41.44785777328187]
XtraGPT is the first suite of open-source large language models (LLMs) designed to provide context-aware, instruction-guided writing assistance.<n>We introduce a dataset of 7,040 research papers from top-tier venues annotated with over 140,000 instruction-response pairs.<n>Experiments validate that XtraGPT significantly outperforms same-scale baselines and approaches the quality of proprietary systems.
arXiv Detail & Related papers (2025-05-16T15:02:19Z) - Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling [63.98194996746229]
Large language models (LLMs) are prone to hallucination and producing factually incorrect information.<n>We propose a novel framework, called Think&Cite, and formulate attributed text generation as a multi-step reasoning problem integrated with search.
arXiv Detail & Related papers (2024-12-19T13:55:48Z) - Epidemiology-informed Network for Robust Rumor Detection [59.89351792706995]
We propose a novel Epidemiology-informed Network (EIN) that integrates epidemiological knowledge to enhance performance.<n>To adapt epidemiology theory to rumor detection, it is expected that each users stance toward the source information will be annotated.<n>Our experimental results demonstrate that the proposed EIN not only outperforms state-of-the-art methods on real-world datasets but also exhibits enhanced robustness across varying tree depths.
arXiv Detail & Related papers (2024-11-20T00:43:32Z) - Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment.
We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z) - Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables [22.18384189336634]
HeterFC is a word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information.
We perform information propagation via a relational graph neural network, interactions between claims and evidence.
We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval.
arXiv Detail & Related papers (2024-02-20T14:10:40Z) - Evaluating BERT-based Scientific Relation Classifiers for Scholarly
Knowledge Graph Construction on Digital Library Collections [5.8962650619804755]
Inferring semantic relations between related scientific concepts is a crucial step.
BERT-based pre-trained models have been popularly explored for automatic relation classification.
Existing methods are primarily evaluated on clean texts.
To address these limitations, we started by creating OCR-noisy texts.
arXiv Detail & Related papers (2023-05-03T17:32:16Z) - RerrFact: Reduced Evidence Retrieval Representations for Scientific
Claim Verification [4.052777228128475]
We propose a modular approach that sequentially carries out binary classification for every prediction subtask.
We carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim.
Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard.
arXiv Detail & Related papers (2022-02-05T21:52:45Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Fact or Fiction: Verifying Scientific Claims [53.29101835904273]
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim.
We construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales.
We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus.
arXiv Detail & Related papers (2020-04-30T17:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.