Causal Parrots: Large Language Models May Talk Causality But Are Not
Causal
- URL: http://arxiv.org/abs/2308.13067v1
- Date: Thu, 24 Aug 2023 20:23:13 GMT
- Title: Causal Parrots: Large Language Models May Talk Causality But Are Not
Causal
- Authors: Matej Ze\v{c}evi\'c and Moritz Willig and Devendra Singh Dhami and
Kristian Kersting
- Abstract summary: We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise.
We conjecture that in the cases where LLM succeed in doing causal inference, underlying was a respective meta SCM.
If our hypothesis holds true, then this would imply that LLMs are like parrots in that they simply recite the causal knowledge embedded in the data.
- Score: 24.025116931689606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Some argue scale is all what is needed to achieve AI, covering even causal
models. We make it clear that large language models (LLMs) cannot be causal and
give reason onto why sometimes we might feel otherwise. To this end, we define
and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta
SCM which encode causal facts about other SCM within their variables. We
conjecture that in the cases where LLM succeed in doing causal inference,
underlying was a respective meta SCM that exposed correlations between causal
facts in natural language on whose data the LLM was ultimately trained. If our
hypothesis holds true, then this would imply that LLMs are like parrots in that
they simply recite the causal knowledge embedded in the data. Our empirical
analysis provides favoring evidence that current LLMs are even weak `causal
parrots.'
Related papers
- Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? [62.17959154852391]
Causal reasoning capability is critical in advancing large language models toward strong artificial intelligence.<n>We show that large language models (LLMs) are only capable of performing shallow (level-1) causal reasoning.<n>We propose G2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes.
arXiv Detail & Related papers (2025-06-26T13:11:01Z) - ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning.
We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics.
Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z) - Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the causal reasoning abilities of large language models (LLMs) through the representative problem of inferring causal relationships from narratives.
We find that even state-of-the-art language models rely on unreliable shortcuts, both in terms of the narrative presentation and their parametric knowledge.
arXiv Detail & Related papers (2024-10-31T12:48:58Z) - Probing Causality Manipulation of Large Language Models [12.46951388060595]
Large language models (LLMs) have shown various ability on natural language processing, including problems about causality.
This paper proposes a novel approach to probe causality manipulation hierarchically, by providing different shortcuts to models and observe behaviors.
arXiv Detail & Related papers (2024-08-26T16:00:41Z) - On the attribution of confidence to large language models [0.1478468781294373]
Credences are mental states corresponding to degrees of confidence in propositions.
The theoretical basis for credence attribution is unclear.
It is a distinct possibility that even if LLMs have credences, credence attributions are generally false.
arXiv Detail & Related papers (2024-07-11T10:51:06Z) - LLMs Are Prone to Fallacies in Causal Inference [33.9881589703843]
Recent work shows that causal facts can be effectively extracted from LLMs through prompting.
This work investigates if this success is limited to explicitly-mentioned causal facts in the pretraining data which the model can memorize.
arXiv Detail & Related papers (2024-06-18T00:14:07Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - CLadder: Assessing Causal Reasoning in Language Models [82.8719238178569]
We investigate whether large language models (LLMs) can coherently reason about causality.
We propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al.
arXiv Detail & Related papers (2023-12-07T15:12:12Z) - Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks.
We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio.
Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z) - Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs)
We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables.
We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z) - Statistical Knowledge Assessment for Large Language Models [79.07989821512128]
Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers?
We propose KaRR, a statistical approach to assess factual knowledge for LLMs.
Our results reveal that the knowledge in LLMs with the same backbone architecture adheres to the scaling law, while tuning on instruction-following data sometimes compromises the model's capability to generate factually correct text reliably.
arXiv Detail & Related papers (2023-05-17T18:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.