Related papers: Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD

URL: http://arxiv.org/abs/2508.17450v3
Date: Tue, 09 Sep 2025 05:04:04 GMT
Title: Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD
Authors: Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee,
Abstract summary: Large Language Models (LLMs) can struggle to balance gullibility to misinformation and resistance to valid corrections in persuasive dialogues.<n>We introduce DuET-PD, a framework evaluating multi-turn stance-change dynamics across dual dimensions.<n>We find that even a state-of-the-art model like GPT-4o achieves only 27.32% accuracy in MMLU-Pro under sustained misleading persuasions.
Score: 46.5669887497759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) can struggle to balance gullibility to misinformation and resistance to valid corrections in persuasive dialogues, a critical challenge for reliable deployment. We introduce DuET-PD (Dual Evaluation for Trust in Persuasive Dialogues), a framework evaluating multi-turn stance-change dynamics across dual dimensions: persuasion type (corrective/misleading) and domain (knowledge via MMLU-Pro, and safety via SALAD-Bench). We find that even a state-of-the-art model like GPT-4o achieves only 27.32% accuracy in MMLU-Pro under sustained misleading persuasions. Moreover, results reveal a concerning trend of increasing sycophancy in newer open-source models. To address this, we introduce Holistic DPO, a training approach balancing positive and negative persuasion examples. Unlike prompting or resist-only training, Holistic DPO enhances both robustness to misinformation and receptiveness to corrections, improving Llama-3.1-8B-Instruct's accuracy under misleading persuasion in safety contexts from 4.21% to 76.54%. These contributions offer a pathway to developing more reliable and adaptable LLMs for multi-turn dialogue. Code is available at https://github.com/Social-AI-Studio/DuET-PD.

Related papers

Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions [8.026492468995187]
Small models exhibit extreme compliance, with over 80% of belief changes occurring at the first persuasive turn.<n> meta-cognition prompting increases vulnerability by accelerating belief erosion rather than enhancing robustness.<n>These findings highlight substantial model-dependent limits of current robustness interventions.
arXiv Detail & Related papers (2026-01-20T04:43:55Z)
Demystifying Multi-Agent Debate: The Role of Confidence and Diversity [31.236476720977294]
Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling.<n>Recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost.<n>We identify two key mechanisms missing from vanilla MAD: (i) diversity of initial viewpoints and (ii) explicit, calibrated confidence communication.
arXiv Detail & Related papers (2026-01-09T02:38:30Z)
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion [73.99171322670772]
Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news.<n> MMPersuade provides a unified framework for systematically studying multimodal persuasion dynamics in LVLMs.
arXiv Detail & Related papers (2025-10-26T17:39:21Z)
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL [64.3268313484078]
Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare.<n>Their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns.<n>We investigate the extent to which LLMs engage in deception within dialogue, and propose the belief misalignment metric to quantify deception.
arXiv Detail & Related papers (2025-10-16T05:29:36Z)
Enhancing Multi-Agent Debate System Performance via Confidence Expression [55.34012400580016]
Multi-Agent Debate (MAD) systems simulate human debate and thereby improve task performance.<n>Some Large Language Models (LLMs) possess superior knowledge or reasoning capabilities for specific tasks, but struggle to clearly communicate this advantage during debates.<n>Inappropriate confidence expression can cause agents in MAD systems to either stubbornly maintain incorrect beliefs or converge prematurely on suboptimal answers.<n>We develop ConfMAD, a MAD framework that integrates confidence expression throughout the debate process.
arXiv Detail & Related papers (2025-09-17T14:34:27Z)
On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks [23.95254828487318]
We present the first comprehensive study on the robustness of verbal confidence under adversarial attacks.<n>We introduce a novel framework for attacking verbal confidence scores through both perturbation and jailbreak-based methods.<n>Our findings underscore the urgent need to design more robust mechanisms for confidence expression in large language models.
arXiv Detail & Related papers (2025-07-09T02:19:46Z)
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization [57.69385990442078]
Large language models (LLMs) exhibit varying levels of confidence across input prompts (questions)<n>Semantic entropy measures the diversity of meaning in multiple generated answers given a prompt and uses this to modulate the magnitude of policy updates.
arXiv Detail & Related papers (2025-05-18T10:20:59Z)
Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models [9.402740034754455]
Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion.<n>LLMs' susceptibility to persuasion raises concerns about alignment with ethical principles.<n>We introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasion through multi-agent interactions.
arXiv Detail & Related papers (2025-03-03T18:53:21Z)
Adversarial Prompt Distillation for Vision-Language Models [63.24270920122456]
Adversarial Prompt Tuning (APT) applies adversarial training during the process of prompt tuning.<n>APD is a bimodal knowledge distillation framework that enhances APT by integrating it with multi-modal knowledge transfer.<n>Extensive experiments on multiple benchmark datasets demonstrate the superiority of our APD method over the current state-of-the-art APT methods.
arXiv Detail & Related papers (2024-11-22T03:02:13Z)
Teaching Models to Balance Resisting and Accepting Persuasion [69.68379406317682]
We show that Persuasion-Training (or PBT) can balance positive and negative persuasion.<n>PBT allows us to use data generated from dialogues between smaller 7-8B models for training much larger 70B models.<n>We find that PBT leads to better and more stable results and less order dependence.
arXiv Detail & Related papers (2024-10-18T16:49:36Z)
Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome [13.731895847081953]
We present a novel approach that tracks a user's latent personality dimensions (LPDs) during ongoing persuasion conversation. We generate tailored counterfactual utterances based on these LPDs to optimize the overall persuasion outcome.
arXiv Detail & Related papers (2024-04-21T23:03:47Z)
LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog. Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.