BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory
Information
- URL: http://arxiv.org/abs/2306.07934v1
- Date: Tue, 13 Jun 2023 17:39:20 GMT
- Title: BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory
Information
- Authors: Mehran Kazemi, Quan Yuan, Deepti Bhatia, Najoung Kim, Xin Xu, Vaiva
Imbrasaite, Deepak Ramachandran
- Abstract summary: Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning.
This paper formulates the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning.
We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem.
- Score: 11.299785330182004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated reasoning with unstructured natural text is a key requirement for
many potential applications of NLP and for developing robust AI systems.
Recently, Language Models (LMs) have demonstrated complex reasoning capacities
even without any finetuning. However, existing evaluation for automated
reasoning assumes access to a consistent and coherent set of information over
which models reason. When reasoning in the real-world, the available
information is frequently inconsistent or contradictory, and therefore models
need to be equipped with a strategy to resolve such conflicts when they arise.
One widely-applicable way of resolving conflicts is to impose preferences over
information sources (e.g., based on source credibility or information recency)
and adopt the source with higher preference. In this paper, we formulate the
problem of reasoning with contradictory information guided by preferences over
sources as the classical problem of defeasible reasoning, and develop a dataset
called BoardgameQA for measuring the reasoning capacity of LMs in this setting.
BoardgameQA also incorporates reasoning with implicit background knowledge, to
better reflect reasoning problems in downstream applications. We benchmark
various LMs on BoardgameQA and the results reveal a significant gap in the
reasoning capacity of state-of-the-art LMs on this problem, showing that
reasoning with conflicting information does not surface out-of-the-box in LMs.
While performance can be improved with finetuning, it nevertheless remains
poor.
Related papers
- Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation [24.081573908824353]
First-order logic (FOL) reasoning is pivotal for intelligent systems.
Existing benchmarks often rely on extensive human annotation or handcrafted templates.
We propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models with the rigor and precision of symbolic provers.
arXiv Detail & Related papers (2025-02-10T15:31:54Z) - CausalEval: Towards Better Causal Reasoning in Language Models [16.55801836321059]
Causal reasoning (CR) is a crucial aspect of intelligence, essential for problem-solving, decision-making, and understanding the world.
While language models (LMs) can generate rationales for their outputs, their ability to reliably perform causal reasoning remains uncertain.
We introduce CausalEval, a review of research aimed at enhancing LMs for causal reasoning.
arXiv Detail & Related papers (2024-10-22T04:18:19Z) - Belief Revision: The Adaptability of Large Language Models Reasoning [63.0281286287648]
We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence.
Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning framework.
We evaluate $sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information.
arXiv Detail & Related papers (2024-06-28T09:09:36Z) - Large Language Models for Constrained-Based Causal Discovery [4.858756226945995]
Causality is essential for understanding complex systems, such as the economy, the brain, and the climate.
This work explores the capabilities of Large Language Models (LLMs) as an alternative to domain experts for causal graph generation.
arXiv Detail & Related papers (2024-06-11T15:45:24Z) - Case-Based Reasoning Approach for Solving Financial Question Answering [5.10832476049103]
FinQA introduced a numerical reasoning dataset for financial documents.
We propose a novel approach to tackle numerical reasoning problems using case based reasoning (CBR)
Our model retrieves relevant cases to address a given question, and then generates an answer based on the retrieved cases and contextual information.
arXiv Detail & Related papers (2024-05-18T10:06:55Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - Bayesian Preference Elicitation with Language Models [82.58230273253939]
We introduce OPEN, a framework that uses BOED to guide the choice of informative questions and an LM to extract features.
In user studies, we find that OPEN outperforms existing LM- and BOED-based methods for preference elicitation.
arXiv Detail & Related papers (2024-03-08T18:57:52Z) - Is Knowledge All Large Language Models Needed for Causal Reasoning? [11.476877330365664]
This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence.
We propose a novel causal attribution model that utilizes do-operators" for constructing counterfactual scenarios.
arXiv Detail & Related papers (2023-12-30T04:51:46Z) - How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study [59.13867562744973]
This work systematically assesses LMs' capabilities for out-of-distribution (OOD) scenarios.
We find that the efficacy of such learning paradigms varies with the type of OOD.
Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts.
arXiv Detail & Related papers (2023-09-15T11:15:47Z) - A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain.
We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work.
We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.