Related papers: MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection

MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection

URL: http://arxiv.org/abs/2510.23727v1
Date: Mon, 27 Oct 2025 18:03:11 GMT
Title: MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection
Authors: Anisha Saha, Varsha Suresh, Timothy Hospedales, Vera Demberg,
Abstract summary: VideoLMs struggle with complex tasks like sarcasm detection.<n>MUStReason is a diagnostic benchmark enriched with annotations of modality-specific relevant cues.<n>We propose PragCoT, a framework that steers VideoLMs to focus on implied intentions over literal meaning.
Score: 16.725936163763684
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sarcasm is a specific type of irony which involves discerning what is said from what is meant. Detecting sarcasm depends not only on the literal content of an utterance but also on non-verbal cues such as speaker's tonality, facial expressions and conversational context. However, current multimodal models struggle with complex tasks like sarcasm detection, which require identifying relevant cues across modalities and pragmatically reasoning over them to infer the speaker's intention. To explore these limitations in VideoLMs, we introduce MUStReason, a diagnostic benchmark enriched with annotations of modality-specific relevant cues and underlying reasoning steps to identify sarcastic intent. In addition to benchmarking sarcasm classification performance in VideoLMs, using MUStReason we quantitatively and qualitatively evaluate the generated reasoning by disentangling the problem into perception and reasoning, we propose PragCoT, a framework that steers VideoLMs to focus on implied intentions over literal meaning, a property core to detecting sarcasm.

Related papers

IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection [7.739824004160998]
We present IRONIC, an in-context learning framework that leverages Multi-modal Coherence Relations to analyze referential, analogical and pragmatic image-text linkages.<n>Our experiments show that IRONIC achieves state-of-the-art performance on zero-shot Multi-modal Sarcasm Detection.
arXiv Detail & Related papers (2025-05-22T05:49:01Z)
Detecting Emotional Incongruity of Sarcasm by Commonsense Reasoning [32.5690489394632]
This paper focuses on sarcasm detection, which aims to identify whether given statements convey criticism, mockery, or other negative sentiment opposite to the literal meaning.<n>Existing methods lack commonsense inferential ability when they face complex real-world scenarios, leading to unsatisfactory performance.<n>We propose a novel framework for sarcasm detection, which conducts incongruity reasoning based on commonsense augmentation, called EICR.
arXiv Detail & Related papers (2024-12-17T11:25:55Z)
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z)
Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue [63.32199372362483]
We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.<n>In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.<n>We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
arXiv Detail & Related papers (2024-02-06T03:14:46Z)
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations. We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z)
Sarcasm Detection Framework Using Emotion and Sentiment Features [62.997667081978825]
We propose a model which incorporates emotion and sentiment features to capture the incongruity intrinsic to sarcasm. Our approach achieved state-of-the-art results on four datasets from social networking platforms and online media.
arXiv Detail & Related papers (2022-11-23T15:14:44Z)
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation [62.89586083449108]
We study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image. CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities. We propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation.
arXiv Detail & Related papers (2022-11-20T14:38:24Z)
When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues [27.884015521888458]
We study the discourse structure of sarcastic conversations and propose a novel task - Sarcasm Explanation in Dialogue (SED) SED aims to generate natural language explanations of satirical conversations. We propose MAF, a multimodal context-aware attention and global information fusion module to capture multimodality and use it to benchmark WITS.
arXiv Detail & Related papers (2022-03-12T12:16:07Z)
Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism [7.194040730138362]
We construct a Contras-tive-Attention-based Sarcasm Detection (ConAttSD) model, which uses an inter-modality contrastive attention mechanism to extract contrastive features for an utterance. Our experiments on MUStARD, a benchmark multi-modal sarcasm dataset, demonstrate the effectiveness of the proposed ConAttSD model.
arXiv Detail & Related papers (2021-09-30T14:17:51Z)
"Laughing at you or with you": The Role of Sarcasm in Shaping the Disagreement Space [10.73235256149378]
We present a corpus annotated with both argumentative moves (agree/disagree) and sarcasm. We exploit joint modeling in terms of (a) applying discrete features that are useful in detecting sarcasm to the task of argumentative relation classification. We demonstrate that modeling sarcasm improves the argumentative relation classification task (agree/disagree/none) in all setups.
arXiv Detail & Related papers (2021-01-26T17:19:18Z)
$R^3$: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge [51.70688120849654]
We propose an unsupervised approach for sarcasm generation based on a non-sarcastic input sentence. Our method employs a retrieve-and-edit framework to instantiate two major characteristics of sarcasm.
arXiv Detail & Related papers (2020-04-28T02:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.