Related papers: When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

URL: http://arxiv.org/abs/2509.11141v1
Date: Sun, 14 Sep 2025 07:21:44 GMT
Title: When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
Authors: Shiyao Cui, Xijia Feng, Yingkang Wang, Junxiao Yang, Zhexin Zhang, Biplab Sikdar, Hongning Wang, Han Qiu, Minlie Huang,
Abstract summary: Emojis are globally used non-verbal cues in digital communication.<n>It is observed that emojis may trigger toxic content generation in large language models.
Score: 83.94875431097908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emojis are globally used non-verbal cues in digital communication, and extensive research has examined how large language models (LLMs) understand and utilize emojis across contexts. While usually associated with friendliness or playfulness, it is observed that emojis may trigger toxic content generation in LLMs. Motivated by such a observation, we aim to investigate: (1) whether emojis can clearly enhance the toxicity generation in LLMs and (2) how to interpret this phenomenon. We begin with a comprehensive exploration of emoji-triggered LLM toxicity generation by automating the construction of prompts with emojis to subtly express toxic intent. Experiments across 5 mainstream languages on 7 famous LLMs along with jailbreak tasks demonstrate that prompts with emojis could easily induce toxicity generation. To understand this phenomenon, we conduct model-level interpretations spanning semantic cognition, sequence generation and tokenization, suggesting that emojis can act as a heterogeneous semantic channel to bypass the safety mechanisms. To pursue deeper insights, we further probe the pre-training corpus and uncover potential correlation between the emoji-related data polution with the toxicity generation behaviors. Supplementary materials provide our implementation code and data. (Warning: This paper contains potentially sensitive contents)

Related papers

The Prosody of Emojis [73.70220975424597]
This study examines how emojis influence prosodic realisation in speech and how listeners interpret prosodic cues to recover emoji meanings.<n>Unlike previous work, we directly link prosody and emoji by analysing actual human speech data, collected through structured but open-ended production and perception tasks.<n>Results show that speakers adapt their prosody based on emoji cues, listeners can often identify the intended emoji from prosodic variation alone, and greater semantic differences between emojis correspond to increased prosodic divergence.
arXiv Detail & Related papers (2025-08-01T11:24:12Z)
The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation [13.409540662525995]
Emojis, despite being rarely offensive in isolation, can acquire harmful meanings through symbolic associations, sarcasm, and contextual misuse.<n>We propose an LLM-powered, multi-step moderation pipeline that replaces harmful emojis while preserving the tweet's semantic intent.<n>Our analysis also reveals heterogeneous effects across offense types, offering nuanced insights for online communication and emoji moderation.
arXiv Detail & Related papers (2025-05-31T14:39:08Z)
Unlocking Cross-Lingual Sentiment Analysis through Emoji Interpretation: A Multimodal Generative AI Approach [8.762679920056486]
Emojis have become ubiquitous in online communication, serving as a universal medium to convey emotions.<n>Our study aims to investigate the capacity of emojis to serve as reliable sentiment markers through large language models (LLMs)<n>Our analysis reveals that the accuracy of LLM-based emoji-conveyed sentiment is 81.43%, underscoring emojis' significant potential to serve as a universal sentiment marker.
arXiv Detail & Related papers (2024-12-23T03:57:45Z)
Semantics Preserving Emoji Recommendation with Large Language Models [47.94761630160614]
Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. We propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text.
arXiv Detail & Related papers (2024-09-16T22:27:46Z)
EmojiLM: Modeling the New Emoji Language [44.23076273155259]
We develop a text-emoji parallel corpus, Text2Emoji, from a large language model. Based on the parallel corpus, we distill a sequence-to-sequence model, EmojiLM, which is specialized in the text-emoji bidirectional translation. Our proposed model outperforms strong baselines and the parallel corpus benefits emoji-related downstream tasks.
arXiv Detail & Related papers (2023-11-03T07:06:51Z)
Emojich -- zero-shot emoji generation using Russian language: a technical report [52.77024349608834]
"Emojich" is a text-to-image neural network that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage.
arXiv Detail & Related papers (2021-12-04T23:37:32Z)
Emoji-aware Co-attention Network with EmoGraph2vec Model for Sentiment Anaylsis [9.447106020795292]
We propose a method to learn emoji representations called EmoGraph2vec and design an emoji-aware co-attention network. Our model designs a co-attention mechanism to incorporate the text and emojis, and integrates a squeeze-and-excitation block into a convolutional neural network. Experimental results show that the proposed model can outperform several baselines for sentiment analysis on benchmark datasets.
arXiv Detail & Related papers (2021-10-27T08:01:10Z)
Black or White but never neutral: How readers perceive identity from yellow or skin-toned emoji [90.14874935843544]
Recent work established a connection between expression of identity and emoji usage on social media. This work asks if, as with language, readers are sensitive to such acts of self-expression and use them to understand the identity of authors.
arXiv Detail & Related papers (2021-05-12T18:23:51Z)
Semantic Journeys: Quantifying Change in Emoji Meaning from 2012-2018 [66.28665205489845]
We offer the first longitudinal study of how emoji semantics changes over time, applying techniques from computational linguistics to six years of Twitter data. We identify five patterns in emoji semantic development and find evidence that the less abstract an emoji is, the more likely it is to undergo semantic change. To aid future work on emoji and semantics, we make our data publicly available along with a web-based interface that anyone can use to explore semantic change in emoji.
arXiv Detail & Related papers (2021-05-03T13:35:10Z)
Are Emojis Emotional? A Study to Understand the Association between Emojis and Emotions [37.86739837901986]
We seek to explore the connection between emojis and emotions by means of a new dataset consisting of human-solicited association ratings. We additionally conduct experiments to assess to what extent such associations can be inferred from existing data, such as similar associations can be predicted for a larger set of emojis.
arXiv Detail & Related papers (2020-05-02T04:04:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.