CLadder: Assessing Causal Reasoning in Language Models
- URL: http://arxiv.org/abs/2312.04350v3
- Date: Wed, 17 Jan 2024 14:41:55 GMT
- Title: CLadder: Assessing Causal Reasoning in Language Models
- Authors: Zhijing Jin, Yuen Chen, Felix Leeb, Luigi Gresele, Ojasv Kamal,
Zhiheng Lyu, Kevin Blin, Fernando Gonzalez Adauto, Max Kleiman-Weiner,
Mrinmaya Sachan, Bernhard Sch\"olkopf
- Abstract summary: We investigate whether large language models (LLMs) can coherently reason about causality.
We propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al.
- Score: 82.8719238178569
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to perform causal reasoning is widely considered a core feature
of intelligence. In this work, we investigate whether large language models
(LLMs) can coherently reason about causality. Much of the existing work in
natural language processing (NLP) focuses on evaluating commonsense causal
reasoning in LLMs, thus failing to assess whether a model can perform causal
inference in accordance with a set of well-defined formal rules. To address
this, we propose a new NLP task, causal inference in natural language, inspired
by the "causal inference engine" postulated by Judea Pearl et al. We compose a
large dataset, CLadder, with 10K samples: based on a collection of causal
graphs and queries (associational, interventional, and counterfactual), we
obtain symbolic questions and ground-truth answers, through an oracle causal
inference engine. These are then translated into natural language. We evaluate
multiple LLMs on our dataset, and we introduce and evaluate a bespoke
chain-of-thought prompting strategy, CausalCoT. We show that our task is highly
challenging for LLMs, and we conduct an in-depth analysis to gain deeper
insights into the causal reasoning abilities of LLMs. Our data is open-sourced
at https://huggingface.co/datasets/causalNLP/cladder, and our code can be found
at https://github.com/causalNLP/cladder.
Related papers
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Is Knowledge All Large Language Models Needed for Causal Reasoning? [11.476877330365664]
This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence.
We propose a novel causal attribution model that utilizes do-operators" for constructing counterfactual scenarios.
arXiv Detail & Related papers (2023-12-30T04:51:46Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs)
We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables.
We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z) - Reliable Natural Language Understanding with Large Language Models and
Answer Set Programming [0.0]
Large language models (LLMs) are able to leverage patterns in the text to solve a variety of NLP tasks, but fall short in problems that require reasoning.
We propose STAR, a framework that combines LLMs with Answer Set Programming (ASP)
Goal-directed ASP is then employed to reliably reason over this knowledge.
arXiv Detail & Related papers (2023-02-07T22:37:21Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.