Related papers: Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

URL: http://arxiv.org/abs/2602.15842v1
Date: Wed, 21 Jan 2026 07:33:44 GMT
Title: Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?
Authors: Ryosuke Kohita, Seiichiro Yoshioka,
Abstract summary: We introduce the Meme Reply Selection task and present MaMe-Re, a benchmark of 100,000 human-annotated pairs (500,000 total annotations from 2,325 unique annotators)<n>Our analysis reveals three key insights: (1) large language models (LLMs) show preliminary evidence of capturing complex social cues such as exaggeration; (2) the inclusion of visual information does not improve performance, revealing a gap between understanding visual content and effectively using it for contextual humor; and (3) while LLMs can match human judgments in controlled settings, they struggle to distinguish subtle differences in wit among semantically similar candidates.
Score: 0.7834991119179472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memes are a popular element of modern web communication, used not only as static artifacts but also as interactive replies within conversations. While computational research has focused on analyzing the intrinsic properties of memes, the dynamic and contextual use of memes to create humor remains an understudied area of web science. To address this gap, we introduce the Meme Reply Selection task and present MaMe-Re (Manga Meme Reply Benchmark), a benchmark of 100,000 human-annotated pairs (500,000 total annotations from 2,325 unique annotators) consisting of openly licensed Japanese manga panels and social media posts. Our analysis reveals three key insights: (1) large language models (LLMs) show preliminary evidence of capturing complex social cues such as exaggeration, moving beyond surface-level semantic matching; (2) the inclusion of visual information does not improve performance, revealing a gap between understanding visual content and effectively using it for contextual humor; (3) while LLMs can match human judgments in controlled settings, they struggle to distinguish subtle differences in wit among semantically similar candidates. These findings suggest that selecting contextually humorous replies remains an open challenge for current models.

Related papers

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking [59.15472057710525]
AVMeme Exam is a human-curated benchmark of over one thousand iconic Internet sounds and videos spanning speech, songs, music, and sound effects.<n>Each meme is paired with a unique Q&A assessing levels of understanding from surface content to context and emotion to usage and world knowledge.<n>We systematically evaluate state-of-the-art multimodal large language models (MLLMs) alongside human participants using this benchmark.
arXiv Detail & Related papers (2026-01-25T01:40:15Z)
MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models [50.2355423914562]
We introduce MemeReaCon, a novel benchmark designed to evaluate how Large Vision Language Models (LVLMs) understand memes in their original context.<n>We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together.<n>Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose.
arXiv Detail & Related papers (2025-05-23T03:27:23Z)
Meme Similarity and Emotion Detection using Multimodal Analysis [0.0]
This study employs a multimodal methodological approach, analyzing both the visual and textual elements of memes.<n>We extract low-level visual features and high-level semantic features to identify similar meme pairs.<n>Results indicate that anger and joy are the dominant emotions in memes, with motivational memes eliciting stronger emotional responses.
arXiv Detail & Related papers (2025-03-21T19:07:16Z)
Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification [0.0]
"meme template" is a layout or format that is used to create memes. Despite extensive research on meme virality, the task of automatically identifying meme templates remains a challenge. This paper presents a comprehensive comparison and evaluation of existing meme template identification methods.
arXiv Detail & Related papers (2024-08-15T12:52:06Z)
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions. We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z)
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities. Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities. Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z)
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes [42.357272117919464]
We introduce a novel task - EXCLAIM, generating explanations for visual semantic role labeling in memes. To this end, we curate ExHVV, a novel dataset that offers natural language explanations of connotative roles for three types of entities. We also posit LUMEN, a novel multimodal, multi-task learning framework that endeavors to address EXCLAIM optimally.
arXiv Detail & Related papers (2022-12-01T18:21:36Z)
Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition. Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z)
DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation [54.84137342837465]
Face-to-face conversations account for the vast majority of daily conversations. Most existing methods focused on single-person talking head generation. We propose a novel unified framework based on neural radiance field (NeRF)
arXiv Detail & Related papers (2022-03-15T14:16:49Z)
Exercise? I thought you said 'Extra Fries': Leveraging Sentence Demarcations and Multi-hop Attention for Meme Affect Analysis [18.23523076710257]
We propose a multi-hop attention-based deep neural network framework, called MHA-MEME. Its prime objective is to leverage the spatial-domain correspondence between the visual modality (an image) and various textual segments to extract fine-grained feature representations for classification. We evaluate MHA-MEME on the 'Memotion Analysis' dataset for all three sub-tasks - sentiment classification, affect classification, and affect class quantification.
arXiv Detail & Related papers (2021-03-23T08:21:37Z)
Dialogue History Matters! Personalized Response Selectionin Multi-turn Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching. Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information. We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.