Related papers: Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models

Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models

URL: http://arxiv.org/abs/2601.07885v1
Date: Mon, 12 Jan 2026 05:34:18 GMT
Title: Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models
Authors: Weipeng Jiang, Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen, Yang Liu,
Abstract summary: Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain largely unexplored.<n>We identify emoticon semantic confusion, a vulnerability where LLMs misinterpret ASCII-based emoticons to perform unintended and even destructive actions.
Score: 38.25786549326184
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain largely unexplored. In this paper, we identify emoticon semantic confusion, a vulnerability where LLMs misinterpret ASCII-based emoticons to perform unintended and even destructive actions. To systematically study this phenomenon, we develop an automated data generation pipeline and construct a dataset containing 3,757 code-oriented test cases spanning 21 meta-scenarios, four programming languages, and varying contextual complexities. Our study on six LLMs reveals that emoticon semantic confusion is pervasive, with an average confusion ratio exceeding 38%. More critically, over 90% of confused responses yield 'silent failures', which are syntactically valid outputs but deviate from user intent, potentially leading to destructive security consequences. Furthermore, we observe that this vulnerability readily transfers to popular agent frameworks, while existing prompt-based mitigations remain largely ineffective. We call on the community to recognize this emerging vulnerability and develop effective mitigation methods to uphold the safety and reliability of the LLM system.

Related papers

Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography [77.44136793431893]
We propose a novel jailbreak paradigm that introduces dual steganography to covertly embed malicious queries into benign-looking images.<n>Our Odysseus successfully jailbreaks several pioneering and realistic MLLM-integrated systems, achieving up to 99% attack success rate.
arXiv Detail & Related papers (2025-12-23T08:53:36Z)
DREAM: Dynamic Red-teaming across Environments for AI Models [28.267208528754082]
We introduce DREAM, a framework for evaluation of Large Language Models (LLMs) against dynamic, multi-stage attacks.<n>At its core, DREAM uses a Cross-Environment Adrial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities.<n>Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models.
arXiv Detail & Related papers (2025-12-22T04:11:57Z)
EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations [57.97838850473147]
Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI.<n>Our study unveils a critical, overlooked vulnerability: their susceptibility to subtle symbolic perturbations.<n>We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts.
arXiv Detail & Related papers (2025-12-01T06:53:49Z)
Behind the Mask: Benchmarking Camouflaged Jailbreaks in Large Language Models [0.0]
camouflaged jailbreaking embeds malicious intent within seemingly benign language to evade existing safety mechanisms.<n>This paper investigates the construction and impact of camouflaged jailbreak prompts, emphasizing their deceptive characteristics and the limitations of traditional keyword-based detection methods.
arXiv Detail & Related papers (2025-09-05T19:57:38Z)
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models.<n>It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines.<n>We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z)
EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context [45.821481786228226]
We show that situation-driven adversarial full-prompts that leverage situational context are effective but much harder to detect.<n>We developed attacks that use movie scripts as situational contextual frameworks.<n>We enhanced the AdvPrompter framework with p-nucleus sampling to generate diverse human-readable adversarial texts.
arXiv Detail & Related papers (2024-12-20T21:43:52Z)
Jailbreaking Large Language Models with Symbolic Mathematics [6.31180501514722]
Recent advancements in AI safety have led to increased efforts in training and red-teaming large language models (LLMs) to mitigate unsafe content generation. This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs' advanced capabilities in symbolic mathematics to bypass their safety mechanisms.
arXiv Detail & Related papers (2024-09-17T03:39:45Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.