Related papers: EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models

EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models

URL: http://arxiv.org/abs/2511.07193v1
Date: Mon, 10 Nov 2025 15:24:01 GMT
Title: EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models
Authors: Jiacheng Huang, Ning Yu, Xiaoyin Yi,
Abstract summary: Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored.<n>We present EMODIS, a new benchmark for evaluating LLMs' capacity to interpret ambiguous emoji expressions under minimal but contrastive textual contexts.
Score: 6.145223237741804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored. In this work, we present EMODIS, a new benchmark for evaluating LLMs' capacity to interpret ambiguous emoji expressions under minimal but contrastive textual contexts. Each instance in EMODIS comprises an ambiguous sentence containing an emoji, two distinct disambiguating contexts that lead to divergent interpretations, and a specific question that requires contextual reasoning. We evaluate both open-source and API-based LLMs, and find that even the strongest models frequently fail to distinguish meanings when only subtle contextual cues are present. Further analysis reveals systematic biases toward dominant interpretations and limited sensitivity to pragmatic contrast. EMODIS provides a rigorous testbed for assessing contextual disambiguation, and highlights the gap in semantic reasoning between humans and LLMs.

Related papers

Bridging Lexical Ambiguity and Vision: A Mini Review on Visual Word Sense Disambiguation [0.0]
Visual Word Sense Disambiguation helps tackle lexical ambiguity in vision-language tasks.<n>VWSD uses visual cues to find the right meaning of ambiguous words with minimal text input.<n>Studies from 2016 to 2025 are examined to show the growth of VWSD through feature-based, graph-based, and contrastive embedding techniques.
arXiv Detail & Related papers (2026-02-01T12:36:01Z)
E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z)
SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation [55.26111461168754]
We introduce SMILE: Semantic Metric Integrating Lexical Exactness, a novel approach that combines sentence-level semantic understanding with keyword-level semantic understanding and easy keyword matching.<n>It is highly correlated with human judgments and computationally lightweight, bridging the gap between lexical and semantic evaluation.
arXiv Detail & Related papers (2025-11-21T17:30:18Z)
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth [21.092167028989632]
Drivelology is a linguistic phenomenon characterised by "nonsense with depth"<n>We construct a benchmark dataset of over 1,200+ meticulously curated and diverse examples across English, Mandarin, Spanish, French, Japanese, and Korean.<n>We find that current large language models (LLMs) consistently fail to grasp the layered semantics of Drivelological text.
arXiv Detail & Related papers (2025-09-04T03:58:55Z)
Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity [16.065963688326242]
We study the trustworthiness of large language models (LLMs) when encountering ambiguous narrative text in Chinese.<n>We created a benchmark dataset by collecting and generating ambiguous sentences with context and their corresponding disambiguated pairs.<n>We discovered significant fragility in LLMs when handling ambiguity, revealing behavior that differs substantially from humans.
arXiv Detail & Related papers (2025-07-30T21:50:19Z)
A Controllable Examination for Long-Context Language Models [62.845852724511964]
This study introduces $textbfLongBioBench, a benchmark for evaluating long-context language models.<n>We show that most models still exhibit deficiencies in semantic understanding and elementary reasoning over retrieved results.<n>Our further analysis indicates some design choices employed by existing synthetic benchmarks, such as contextual non-coherence.
arXiv Detail & Related papers (2025-06-03T14:23:06Z)
Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey [54.90240495777929]
Ambiguity remains a fundamental challenge in Natural Language Processing (NLP)<n>With the advent of Large Language Models (LLMs), addressing ambiguity has become even more critical due to their expanded capabilities and applications.<n>This paper explores the definition, forms, and implications of ambiguity for language driven systems.
arXiv Detail & Related papers (2025-05-18T20:53:41Z)
Comparing LLM Text Annotation Skills: A Study on Human Rights Violations in Social Media Data [2.812898346527047]
This study investigates the capabilities of large language models (LLMs) for zero-shot and few-shot annotation of social media posts in Russian and Ukrainian.<n>To evaluate the effectiveness of these models, their annotations are compared against a gold standard set of human double-annotated labels.<n>The study explores the unique patterns of errors and disagreements exhibited by each model, offering insights into their strengths, limitations, and cross-linguistic adaptability.
arXiv Detail & Related papers (2025-05-15T13:10:47Z)
Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments [24.133419857271505]
Single gold-standard interpretations rarely exist, challenging conventional assumptions in natural language processing. This work introduces the interpretation modeling (IM) task which involves modeling several interpretations of a sentence's underlying semantics. A first-of-its-kind IM dataset is curated to support experiments and analyses.
arXiv Detail & Related papers (2023-11-27T07:50:55Z)
We're Afraid Language Models Aren't Modeling Ambiguity [136.8068419824318]
Managing ambiguity is a key part of human language understanding. We characterize ambiguity in a sentence by its effect on entailment relations with another sentence. We show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity.
arXiv Detail & Related papers (2023-04-27T17:57:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.