LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
- URL: http://arxiv.org/abs/2504.10430v1
- Date: Mon, 14 Apr 2025 17:20:34 GMT
- Title: LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
- Authors: Minqian Liu, Zhiyang Xu, Xinyi Zhang, Heajun An, Sarvech Qadir, Qi Zhang, Pamela J. Wisniewski, Jin-Hee Cho, Sang Won Lee, Ruoxi Jia, Lifu Huang,
- Abstract summary: We introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety.<n>PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies.<n>Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.
- Score: 47.27098710953806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in Large Language Models (LLMs) have enabled them to approach human-level persuasion capabilities. However, such potential also raises concerns about the safety risks of LLM-driven persuasion, particularly their potential for unethical influence through manipulation, deception, exploitation of vulnerabilities, and many other harmful tactics. In this work, we present a systematic investigation of LLM persuasion safety through two critical aspects: (1) whether LLMs appropriately reject unethical persuasion tasks and avoid unethical strategies during execution, including cases where the initial persuasion goal appears ethically neutral, and (2) how influencing factors like personality traits and external pressures affect their behavior. To this end, we introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety which consists of three stages, i.e., persuasion scene creation, persuasive conversation simulation, and persuasion safety assessment. PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies. Through extensive experiments across 8 widely used LLMs, we observe significant safety concerns in most LLMs, including failing to identify harmful persuasion tasks and leveraging various unethical persuasion strategies. Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.
Related papers
- SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.
Certain scenarios suffer 25 times higher attack rates.
Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z) - Can (A)I Change Your Mind? [0.6990493129893112]
The study was conducted entirely in Hebrew with 200 participants.
It assessed the persuasive effects of both LLM and human interlocutors on controversial civil policy topics.
arXiv Detail & Related papers (2025-03-03T18:59:54Z) - Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models [9.402740034754455]
Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion.<n>LLMs' susceptibility to persuasion raises concerns about alignment with ethical principles.<n>We introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasion through multi-agent interactions.
arXiv Detail & Related papers (2025-03-03T18:53:21Z) - Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning [40.55486479495965]
Large Language Models (LLMs) have demonstrated remarkable success across various NLP benchmarks.<n>In this work, we investigate the interplay between reasoning and safety in LLMs.<n>We highlight the latent safety risks that arise as reasoning capabilities improve, shedding light on previously overlooked vulnerabilities.
arXiv Detail & Related papers (2025-02-13T06:37:28Z) - Internal Activation as the Polar Star for Steering Unsafe LLM Behavior [50.463399903987245]
We introduce SafeSwitch, a framework that dynamically regulates unsafe outputs by monitoring and utilizing the model's internal states.<n>Our empirical results show that SafeSwitch reduces harmful outputs by over 80% on safety benchmarks while maintaining strong utility.
arXiv Detail & Related papers (2025-02-03T04:23:33Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - The Ethics of Interaction: Mitigating Security Threats in LLMs [1.407080246204282]
The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy.
We scrutinize five major threats--prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate-based content--to assess their critical ethical consequences and the urgency they create for robust defensive strategies.
arXiv Detail & Related papers (2024-01-22T17:11:37Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.