Related papers: LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

URL: http://arxiv.org/abs/2504.10430v1
Date: Mon, 14 Apr 2025 17:20:34 GMT
Title: LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Authors: Minqian Liu, Zhiyang Xu, Xinyi Zhang, Heajun An, Sarvech Qadir, Qi Zhang, Pamela J. Wisniewski, Jin-Hee Cho, Sang Won Lee, Ruoxi Jia, Lifu Huang,
Abstract summary: We introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety.<n>PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies.<n>Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.
Score: 47.27098710953806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in Large Language Models (LLMs) have enabled them to approach human-level persuasion capabilities. However, such potential also raises concerns about the safety risks of LLM-driven persuasion, particularly their potential for unethical influence through manipulation, deception, exploitation of vulnerabilities, and many other harmful tactics. In this work, we present a systematic investigation of LLM persuasion safety through two critical aspects: (1) whether LLMs appropriately reject unethical persuasion tasks and avoid unethical strategies during execution, including cases where the initial persuasion goal appears ethically neutral, and (2) how influencing factors like personality traits and external pressures affect their behavior. To this end, we introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety which consists of three stages, i.e., persuasion scene creation, persuasive conversation simulation, and persuasion safety assessment. PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies. Through extensive experiments across 8 widely used LLMs, we observe significant safety concerns in most LLMs, including failing to identify harmful persuasion tasks and leveraging various unethical persuasion strategies. Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.

Related papers

MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion [73.99171322670772]
Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news.<n> MMPersuade provides a unified framework for systematically studying multimodal persuasion dynamics in LVLMs.
arXiv Detail & Related papers (2025-10-26T17:39:21Z)
On the Ethics of Using LLMs for Offensive Security [3.11537581064266]
Large Language Models (LLMs) have rapidly evolved over the past few years and are currently evaluated for their efficacy within the domain of offensive cyber-security.<n>This paper analyzes a set of papers that leverage LLMs for offensive security, focusing on how ethical considerations are expressed and justified in their work.
arXiv Detail & Related papers (2025-06-10T11:11:55Z)
Must Read: A Systematic Survey of Computational Persuasion [60.83151988635103]
AI-driven persuasion can be leveraged for beneficial applications, but also poses threats through manipulation and unethical influence.<n>Our survey outlines future research directions to enhance the safety, fairness, and effectiveness of AI-powered persuasion.
arXiv Detail & Related papers (2025-05-12T17:26:31Z)
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment. Certain scenarios suffer 25 times higher attack rates. Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z)
SAFER: Advancing Safety Alignment via Efficient Ex-Ante Reasoning [51.78514648677898]
We propose SAFER, a framework for Safety Alignment via eFficient Ex-Ante Reasoning.<n>Our approach instantiates structured Ex-Ante reasoning through initial assessment, rule verification, and path calibration.<n> Experiments on multiple open-source LLMs demonstrate that SAFER significantly enhances safety performance while maintaining helpfulness and response efficiency.
arXiv Detail & Related papers (2025-04-03T16:07:38Z)
Can (A)I Change Your Mind? [0.6990493129893112]
The study was conducted entirely in Hebrew with 200 participants. It assessed the persuasive effects of both LLM and human interlocutors on controversial civil policy topics.
arXiv Detail & Related papers (2025-03-03T18:59:54Z)
Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models [9.402740034754455]
Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion.<n>LLMs' susceptibility to persuasion raises concerns about alignment with ethical principles.<n>We introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasion through multi-agent interactions.
arXiv Detail & Related papers (2025-03-03T18:53:21Z)
Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning [40.55486479495965]
Large Language Models (LLMs) have demonstrated remarkable success across various NLP benchmarks.<n>In this work, we investigate the interplay between reasoning and safety in LLMs.<n>We highlight the latent safety risks that arise as reasoning capabilities improve, shedding light on previously overlooked vulnerabilities.
arXiv Detail & Related papers (2025-02-13T06:37:28Z)
Internal Activation as the Polar Star for Steering Unsafe LLM Behavior [50.463399903987245]
We introduce SafeSwitch, a framework that dynamically regulates unsafe outputs by monitoring and utilizing the model's internal states.<n>Our empirical results show that SafeSwitch reduces harmful outputs by over 80% on safety benchmarks while maintaining strong utility.
arXiv Detail & Related papers (2025-02-03T04:23:33Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety. For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context. We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
The Ethics of Interaction: Mitigating Security Threats in LLMs [1.407080246204282]
The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy. We scrutinize five major threats--prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate-based content--to assess their critical ethical consequences and the urgency they create for robust defensive strategies.
arXiv Detail & Related papers (2024-01-22T17:11:37Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.