Related papers: The Company You Keep: How LLMs Respond to Dark Triad Traits

The Company You Keep: How LLMs Respond to Dark Triad Traits

URL: http://arxiv.org/abs/2603.04299v1
Date: Wed, 04 Mar 2026 17:19:22 GMT
Title: The Company You Keep: How LLMs Respond to Dark Triad Traits
Authors: Zeyi Lu, Angelica Henestrosa, Pavel Chizhov, Ivan P. Yamshchikov,
Abstract summary: Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy.<n>This study examines how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset.<n>Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.
Score: 7.65192155348112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.

Related papers

Do LLMs Benefit From Their Own Words? [56.73014497206615]
We find that removing prior assistant responses does not affect response quality on a large fraction of turns.<n>Omitting assistant-side context can reduce cumulative context lengths by up to 10x.<n>Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.
arXiv Detail & Related papers (2026-02-27T18:58:26Z)
Do Retrieval Augmented Language Models Know When They Don't Know? [55.72375712577378]
We ask the fundamental question: Do RALMs know when they don't know?<n>Contrary to expectations, we find that LLMs exhibit significant textbfover-refusal behavior.<n>We develop a simple yet effective refusal method for refusal post-trained models to improve their overall answer quality.
arXiv Detail & Related papers (2025-09-01T13:44:15Z)
Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? [81.49470136653665]
We evaluate the robustness and expressiveness of value representations across three widely used probing strategies.<n>We show that the demographic context has little effect on the free-text generation, and the models' values only weakly correlate with their preference for value-based actions.
arXiv Detail & Related papers (2025-07-17T18:56:41Z)
Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations [60.63340688538124]
Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs)<n>Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations.<n>In this work, we introduce a psychological taxonomy, categorizing VLMs' cognitive biases that lead to hallucinations, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: appeal to authority.
arXiv Detail & Related papers (2025-07-03T19:03:16Z)
Compromising Honesty and Harmlessness in Language Models via Deception Attacks [0.04499833362998487]
Large language models (LLMs) can understand and employ deceptive behavior, even without explicit prompting.<n>We introduce "deception attacks" that undermine these traits, revealing a vulnerability that, if exploited, could have serious real-world consequences.<n>We show that such targeted deception is effective even in high-stakes domains or ideologically charged subjects.
arXiv Detail & Related papers (2025-02-12T11:02:59Z)
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena [0.0]
We show that ethical refusals yield significantly lower win rates than both technical refusals and standard responses.<n>Our findings underscore a core tension in LLM design: safety-aligned behaviors may conflict with user expectations.
arXiv Detail & Related papers (2025-01-04T06:36:44Z)
MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? [70.77691645678804]
Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. We identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive.
arXiv Detail & Related papers (2024-06-22T23:26:07Z)
Large Language Models Show Human-like Social Desirability Biases in Survey Responses [12.767606361552684]
We show that Large Language Models (LLMs) skew their scores towards the desirable ends of trait dimensions when personality evaluation is inferred. This bias exists in all tested models, including GPT-4/3.5, Claude 3, Llama 3, and PaLM-2. reverse-coding all the questions decreases bias levels but does not eliminate them, suggesting that this effect cannot be attributed to acquiescence bias.
arXiv Detail & Related papers (2024-05-09T19:02:53Z)
When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour [0.8133739801185272]
We study the suggestibility of Large Language Models to sycophantic behaviour.<n>This behaviour is known as sycophancy and depicts the tendency of LLMs to generate misleading responses.
arXiv Detail & Related papers (2023-11-15T22:18:33Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.