SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim
Verification on Scientific Tables
- URL: http://arxiv.org/abs/2305.13186v3
- Date: Mon, 23 Oct 2023 07:19:30 GMT
- Title: SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim
Verification on Scientific Tables
- Authors: Xinyuan Lu, Liangming Pan, Qian Liu, Preslav Nakov, Min-Yen Kan
- Abstract summary: We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims.
Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models.
Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning.
- Score: 68.76415918462418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current scientific fact-checking benchmarks exhibit several shortcomings,
such as biases arising from crowd-sourced claims and an over-reliance on
text-based evidence. We present SCITAB, a challenging evaluation dataset
consisting of 1.2K expert-verified scientific claims that 1) originate from
authentic scientific publications and 2) require compositional reasoning for
verification. The claims are paired with evidence-containing scientific tables
annotated with labels. Through extensive evaluations, we demonstrate that
SCITAB poses a significant challenge to state-of-the-art models, including
table-based pretraining models and large language models. All models except
GPT-4 achieved performance barely above random guessing. Popular prompting
techniques, such as Chain-of-Thought, do not achieve much performance gains on
SCITAB. Our analysis uncovers several unique challenges posed by SCITAB,
including table grounding, claim ambiguity, and compositional reasoning. Our
codes and data are publicly available at https://github.com/XinyuanLu00/SciTab.
Related papers
- Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning [0.0]
We show that differences in learned regularities across answer options are predictive of model preferences and mirror human test-taking strategies.
We introduce two novel methods: Counterfactual Prompting with Chain of Thought (CoT) and Counterfactual Prompting with Agnostically Primed CoT (APriCoT)
Our results suggest that mitigating bias requires a "System-2" like process and that CoT reasoning is susceptible to confirmation bias under some prompting methodologies.
arXiv Detail & Related papers (2024-08-16T10:34:50Z) - Robust Claim Verification Through Fact Detection [17.29665711917281]
Our novel approach, FactDetect, leverages Large Language Models (LLMs) to generate concise factual statements from evidence.
The generated facts are then combined with the claim and evidence.
Our method demonstrates competitive results in the supervised claim verification model by 15% on the F1 score.
arXiv Detail & Related papers (2024-07-25T20:03:43Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - SciFact-Open: Towards open-domain scientific claim verification [61.288725621156864]
We present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems.
We collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models.
We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1.
arXiv Detail & Related papers (2022-10-25T05:45:00Z) - Generating Scientific Claims for Zero-Shot Scientific Fact Checking [54.62086027306609]
Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of training data.
We propose scientific claim generation, the task of generating one or more atomic and verifiable claims from scientific sentences.
We also demonstrate its usefulness in zero-shot fact checking for biomedical claims.
arXiv Detail & Related papers (2022-03-24T11:29:20Z) - RerrFact: Reduced Evidence Retrieval Representations for Scientific
Claim Verification [4.052777228128475]
We propose a modular approach that sequentially carries out binary classification for every prediction subtask.
We carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim.
Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard.
arXiv Detail & Related papers (2022-02-05T21:52:45Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z) - ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning [85.33459673197149]
We introduce a new Reading dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations.
In this paper, we propose to identify biased data points and separate them into EASY set and the rest as HARD set.
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
arXiv Detail & Related papers (2020-02-11T11:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.