Atomic Reasoning for Scientific Table Claim Verification
- URL: http://arxiv.org/abs/2506.06972v1
- Date: Sun, 08 Jun 2025 02:46:22 GMT
- Title: Atomic Reasoning for Scientific Table Claim Verification
- Authors: Yuji Zhang, Qingyun Wang, Cheng Qian, Jiateng Liu, Chenkai Sun, Denghui Zhang, Tarek Abdelzaher, Chengxiang Zhai, Preslav Nakov, Heng Ji,
- Abstract summary: Non-experts are susceptible to misleading claims based on scientific tables due to their high information density and perceived credibility.<n>Existing table claim verification models, including state-of-the-art large language models (LLMs), often struggle with precise fine-grained reasoning.<n>Inspired by Cognitive Load Theory, we propose that enhancing a model's ability to interpret table-based claims involves reducing cognitive load.
- Score: 83.14588611859826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scientific texts often convey authority due to their technical language and complex data. However, this complexity can sometimes lead to the spread of misinformation. Non-experts are particularly susceptible to misleading claims based on scientific tables due to their high information density and perceived credibility. Existing table claim verification models, including state-of-the-art large language models (LLMs), often struggle with precise fine-grained reasoning, resulting in errors and a lack of precision in verifying scientific claims. Inspired by Cognitive Load Theory, we propose that enhancing a model's ability to interpret table-based claims involves reducing cognitive load by developing modular, reusable reasoning components (i.e., atomic skills). We introduce a skill-chaining schema that dynamically composes these skills to facilitate more accurate and generalizable reasoning with a reduced cognitive load. To evaluate this, we create SciAtomicBench, a cross-domain benchmark with fine-grained reasoning annotations. With only 350 fine-tuning examples, our model trained by atomic reasoning outperforms GPT-4o's chain-of-thought method, achieving state-of-the-art results with far less training data.
Related papers
- Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification [18.20994425756688]
Atomic Fact Extraction and Verification (AFEV) is a novel framework that iteratively decomposes complex claims into atomic facts.<n>AFEV achieves state-of-the-art performance in both accuracy and interpretability.
arXiv Detail & Related papers (2025-06-09T05:49:43Z) - FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models [59.171510592986735]
We propose FactReasoner, a new factuality assessor that relies on probabilistic reasoning to assess the factuality of a long-form generated response.<n>Our experiments on labeled and unlabeled benchmark datasets demonstrate clearly that FactReasoner improves considerably over state-of-the-art prompt-based approaches.
arXiv Detail & Related papers (2025-02-25T19:01:48Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers.
Missci is a novel argumentation theoretical model for fallacious reasoning.
We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim
Verification on Scientific Tables [68.76415918462418]
We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims.
Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models.
Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning.
arXiv Detail & Related papers (2023-05-22T16:13:50Z) - Comparing and extending the use of defeasible argumentation with
quantitative data in real-world contexts [5.482532589225552]
A non-monotonic formalism is one that allows the retraction of previous conclusions or claims, from premises, in light of new evidence.
This study contributes to the body of knowledge through the exploitation of defeasible argumentation and its comparison to similar approaches.
arXiv Detail & Related papers (2022-06-28T12:28:47Z) - RerrFact: Reduced Evidence Retrieval Representations for Scientific
Claim Verification [4.052777228128475]
We propose a modular approach that sequentially carries out binary classification for every prediction subtask.
We carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim.
Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard.
arXiv Detail & Related papers (2022-02-05T21:52:45Z) - ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning [85.33459673197149]
We introduce a new Reading dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations.
In this paper, we propose to identify biased data points and separate them into EASY set and the rest as HARD set.
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
arXiv Detail & Related papers (2020-02-11T11:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.