Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication
- URL: http://arxiv.org/abs/2401.08212v2
- Date: Mon, 15 Apr 2024 12:08:41 GMT
- Title: Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication
- Authors: Hanjia Lyu, Weihong Qi, Zhongyu Wei, Jiebo Luo,
- Abstract summary: This study examines the behavior of GPT-4V in replicating human-like use of emojis.
The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation.
- Score: 68.40865217231695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leveraging Large Multimodal Models (LMMs) to simulate human behaviors when processing multimodal information, especially in the context of social media, has garnered immense interest due to its broad potential and far-reaching implications. Emojis, as one of the most unique aspects of digital communication, are pivotal in enriching and often clarifying the emotional and tonal dimensions. Yet, there is a notable gap in understanding how these advanced models, such as GPT-4V, interpret and employ emojis in the nuanced context of online interaction. This study intends to bridge this gap by examining the behavior of GPT-4V in replicating human-like use of emojis. The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation and the limitations of GPT-4V's English-centric training, suggesting cultural biases and inadequate representation of non-English cultures.
Related papers
- MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs [12.199629860735195]
We compare GPT4 with supervised models and or humans in three aspects: agreement with human annotations, alignment with human perception, and impact on model training.
We find that common metrics that use aggregated human annotations as ground truth can underestimate the performance, of GPT-4.
arXiv Detail & Related papers (2024-08-30T05:50:15Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models [92.60282074937305]
We introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images.
We conduct experiments to assess the performance of 14 foundation models and establish a human performance baseline.
We observe a significant performance gap of 30.8% between GPT-4V and human performance.
arXiv Detail & Related papers (2024-01-24T09:07:11Z) - GPT-4V(ision) as A Social Media Analysis Engine [77.23394183063238]
This paper explores GPT-4V's capabilities for social multimedia analysis.
We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection.
GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge.
arXiv Detail & Related papers (2023-11-13T18:36:50Z) - Holistic Analysis of Hallucination in GPT-4V(ision): Bias and
Interference Challenges [54.42256219010956]
This benchmark is designed to evaluate and shed light on the two common types of hallucinations in visual language models: bias and interference.
bias refers to the model's tendency to hallucinate certain types of responses, possibly due to imbalance in its training data.
interference pertains to scenarios where the judgment of GPT-4V(ision) can be disrupted due to how the text prompt is phrased or how the input image is presented.
arXiv Detail & Related papers (2023-11-06T17:26:59Z) - Fine-grained Affective Processing Capabilities Emerging from Large
Language Models [7.17010996725842]
We explore ChatGPT's zero-shot ability to perform affective computing tasks using prompting alone.
We show that ChatGPT a) performs meaningful sentiment analysis in the Valence, Arousal and Dominance dimensions, b) has meaningful emotion representations in terms of emotion categories, and c) can perform basic appraisal-based emotion elicitation of situations.
arXiv Detail & Related papers (2023-09-04T15:32:47Z) - Does Conceptual Representation Require Embodiment? Insights From Large
Language Models [9.390117546307042]
We compare representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4)
We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability.
arXiv Detail & Related papers (2023-05-30T15:06:28Z) - Large language models predict human sensory judgments across six
modalities [12.914521751805658]
We show that state-of-the-art large language models can unlock new insights into the problem of recovering the perceptual world from language.
We elicit pairwise similarity judgments from GPT models across six psychophysical datasets.
We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral.
arXiv Detail & Related papers (2023-02-02T18:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.