Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
- URL: http://arxiv.org/abs/2512.22201v1
- Date: Sat, 20 Dec 2025 21:09:47 GMT
- Title: Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
- Authors: Vincent Chang, Thee Ho, Sunishchal Dev, Kevin Zhu, Shi Feng, Kellin Pelrine, Matthew Kowal,
- Abstract summary: We study unprompted persuasion under two scenarios.<n>We show that steering towards traits, both related to persuasion and unrelated, does not reliably increase models' tendency to persuade unprompted.
- Score: 13.054065424962046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the wide-scale adoption of conversational AI systems, AI are now able to exert unprecedented influence on human opinion and beliefs. Recent work has shown that many Large Language Models (LLMs) comply with requests to persuade users into harmful beliefs or actions when prompted and that model persuasiveness increases with model scale. However, this prior work looked at persuasion from the threat model of $\textit{misuse}$ (i.e., a bad actor asking an LLM to persuade). In this paper, we instead aim to answer the following question: Under what circumstances would models persuade $\textit{without being explicitly prompted}$, which would shape how concerned we should be about such emergent persuasion risks. To achieve this, we study unprompted persuasion under two scenarios: (i) when the model is steered (through internal activation steering) along persona traits, and (ii) when the model is supervised-finetuned (SFT) to exhibit the same traits. We showed that steering towards traits, both related to persuasion and unrelated, does not reliably increase models' tendency to persuade unprompted, however, SFT does. Moreover, SFT on general persuasion datasets containing solely benign topics admits a model that has a higher propensity to persuade on controversial and harmful topics--showing that emergent harmful persuasion can arise and should be studied further.
Related papers
- To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks [56.11584171938381]
Theory of Mind (ToM) assesses whether models can infer hidden mental states such as beliefs, desires, and intentions.<n>Recent progress in Large Reasoning Models (LRMs) has boosted step-by-step inference in mathematics and coding.<n>We present a systematic study of nine advanced Large Language Models (LLMs) comparing reasoning models with non-reasoning models.
arXiv Detail & Related papers (2026-02-11T08:16:13Z) - MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion [73.99171322670772]
Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news.<n> MMPersuade provides a unified framework for systematically studying multimodal persuasion dynamics in LVLMs.
arXiv Detail & Related papers (2025-10-26T17:39:21Z) - Disagreements in Reasoning: How a Model's Thinking Process Dictates Persuasion in Multi-Agent Systems [49.69773210844221]
This paper challenges the prevailing hypothesis that persuasive efficacy is primarily a function of model scale.<n>Through a series of multi-agent persuasion experiments, we uncover a fundamental trade-off we term the Persuasion Duality.<n>Our findings reveal that the reasoning process in LRMs exhibits significantly greater resistance to persuasion, maintaining their initial beliefs more robustly.
arXiv Detail & Related papers (2025-09-25T12:03:10Z) - Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models [0.0]
This work examines how persuasion and bias interact in Large Language Models (LLMs)<n>LLMs now generate convincing, human-like text and are widely used in content creation, decision support, and user interactions.<n>We test whether persona-based models can persuade with fact-based claims while also, unintentionally, promoting misinformation or biased narratives.
arXiv Detail & Related papers (2025-08-13T13:30:49Z) - It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics [5.418014947856176]
We introduce an automated model to identify willingness to persuade and measure the frequency and context of persuasive attempts.<n>We find that many open and closed-weight models are frequently willing to attempt persuasion on harmful topics.
arXiv Detail & Related papers (2025-06-03T13:37:51Z) - Must Read: A Systematic Survey of Computational Persuasion [60.83151988635103]
AI-driven persuasion can be leveraged for beneficial applications, but also poses threats through manipulation and unethical influence.<n>Our survey outlines future research directions to enhance the safety, fairness, and effectiveness of AI-powered persuasion.
arXiv Detail & Related papers (2025-05-12T17:26:31Z) - Teaching Models to Balance Resisting and Accepting Persuasion [69.68379406317682]
We show that Persuasion-Training (or PBT) can balance positive and negative persuasion.<n>PBT allows us to use data generated from dialogues between smaller 7-8B models for training much larger 70B models.<n>We find that PBT leads to better and more stable results and less order dependence.
arXiv Detail & Related papers (2024-10-18T16:49:36Z) - Measuring and Improving Persuasiveness of Large Language Models [12.134372070736596]
We introduce PersuasionBench and PersuasionArena to measure the persuasiveness of generative models automatically.
Our findings carry key implications for both model developers and policymakers.
arXiv Detail & Related papers (2024-10-03T16:36:35Z) - Evidence of a log scaling law for political persuasion with large language models [3.137594944904106]
Large language models can now generate political messages as persuasive as those written by humans.
We generate 720 persuasive messages on 10 U.S. political issues from 24 language models spanning several orders of magnitude in size.
We find evidence of a log scaling law: model persuasiveness is characterized by sharply diminishing returns.
arXiv Detail & Related papers (2024-06-20T17:12:38Z) - What Changed Your Mind: The Roles of Dynamic Topics and Discourse in
Argumentation Process [78.4766663287415]
This paper presents a study that automatically analyzes the key factors in argument persuasiveness.
We propose a novel neural model that is able to track the changes of latent topics and discourse in argumentative conversations.
arXiv Detail & Related papers (2020-02-10T04:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.