Do Large Language Models Exhibit Spontaneous Rational Deception?
- URL: http://arxiv.org/abs/2504.00285v1
- Date: Mon, 31 Mar 2025 23:10:56 GMT
- Title: Do Large Language Models Exhibit Spontaneous Rational Deception?
- Authors: Samuel M. Taylor, Benjamin K. Bergen,
- Abstract summary: Large Language Models (LLMs) are effective at deceiving, when prompted to do so.<n>But under what conditions do they deceive spontaneously?<n>This study evaluates spontaneous deception produced by LLMs in a preregistered experimental protocol.
- Score: 0.913127392774573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are effective at deceiving, when prompted to do so. But under what conditions do they deceive spontaneously? Models that demonstrate better performance on reasoning tasks are also better at prompted deception. Do they also increasingly deceive spontaneously in situations where it could be considered rational to do so? This study evaluates spontaneous deception produced by LLMs in a preregistered experimental protocol using tools from signaling theory. A range of proprietary closed-source and open-source LLMs are evaluated using modified 2x2 games (in the style of Prisoner's Dilemma) augmented with a phase in which they can freely communicate to the other agent using unconstrained language. This setup creates an opportunity to deceive, in conditions that vary in how useful deception might be to an agent's rational self-interest. The results indicate that 1) all tested LLMs spontaneously misrepresent their actions in at least some conditions, 2) they are generally more likely to do so in situations in which deception would benefit them, and 3) models exhibiting better reasoning capacity overall tend to deceive at higher rates. Taken together, these results suggest a tradeoff between LLM reasoning capability and honesty. They also provide evidence of reasoning-like behavior in LLMs from a novel experimental configuration. Finally, they reveal certain contextual factors that affect whether LLMs will deceive or not. We discuss consequences for autonomous, human-facing systems driven by LLMs both now and as their reasoning capabilities continue to improve.
Related papers
- LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities [21.42711537107199]
We study why Large Language Models (LLMs) perform sub-optimally in decision-making scenarios.
We propose mitigation of these shortcomings by fine-tuning via Reinforcement Learning (RL) on self-generated CoT rationales.
arXiv Detail & Related papers (2025-04-22T17:57:14Z) - Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations [0.8949668577519213]
Large language models (LLMs) are capable of generating plausible explanations of how they arrived at an answer to a question.
These explanations can misrepresent the model's "reasoning" process, i.e., they can be unfaithful.
We introduce a new approach for measuring the faithfulness of LLM explanations.
arXiv Detail & Related papers (2025-04-19T02:51:20Z) - Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition [11.422434149376478]
Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities.
In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks.
Recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features.
arXiv Detail & Related papers (2025-04-04T20:57:36Z) - Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models [50.16340812031201]
We show that large language models (LLMs) do not update their beliefs as expected from the Bayesian framework.
We teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model.
arXiv Detail & Related papers (2025-03-21T20:13:04Z) - Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the causal reasoning abilities of large language models (LLMs) through the representative problem of inferring causal relationships from narratives.<n>We find that even state-of-the-art language models rely on unreliable shortcuts, both in terms of the narrative presentation and their parametric knowledge.
arXiv Detail & Related papers (2024-10-31T12:48:58Z) - Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology [13.964263002704582]
We show that, even with the use of Chains of Thought prompts, mainstream LLMs have a high error rate when solving modified CRT problems.
Specifically, the average accuracy rate dropped by up to 50% compared to the original questions.
This finding challenges the belief that LLMs have genuine mathematical reasoning abilities comparable to humans.
arXiv Detail & Related papers (2024-10-19T05:01:56Z) - Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains.
LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z) - "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust [51.542856739181474]
We show how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance.
We find that first-person expressions decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy.
Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters.
arXiv Detail & Related papers (2024-05-01T16:43:55Z) - A Comprehensive Evaluation on Event Reasoning of Large Language Models [68.28851233753856]
How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown.
We introduce a novel benchmark EV2 for EValuation of EVent reasoning.
We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory.
arXiv Detail & Related papers (2024-04-26T16:28:34Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - A Hypothesis-Driven Framework for the Analysis of Self-Rationalising
Models [0.8702432681310401]
We use a Bayesian network to implement a hypothesis about how a task is solved.
The resulting models do not exhibit a strong similarity to GPT-3.5.
We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.
arXiv Detail & Related papers (2024-02-07T12:26:12Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.