Related papers: MuVaC: AVariational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

MuVaC: AVariational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

URL: http://arxiv.org/abs/2601.20451v1
Date: Wed, 28 Jan 2026 10:19:42 GMT
Title: MuVaC: AVariational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
Authors: Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu,
Abstract summary: Sarcasm analysis requires Multimodal Sarcasm Detection (MSD) and Multimodal Sarcasm Explanation (MuSE)<n>We propose MuVaC, a variational causal inference framework that mimics human cognitive mechanisms for understanding sarcasm.
Score: 21.146757458620105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The prevalence of sarcasm in multimodal dialogues on the social platforms presents a crucial yet challenging task for understanding the true intent behind online content. Comprehensive sarcasm analysis requires two key aspects: Multimodal Sarcasm Detection (MSD) and Multimodal Sarcasm Explanation (MuSE). Intuitively, the act of detection is the result of the reasoning process that explains the sarcasm. Current research predominantly focuses on addressing either MSD or MuSE as a single task. Even though some recent work has attempted to integrate these tasks, their inherent causal dependency is often overlooked. To bridge this gap, we propose MuVaC, a variational causal inference framework that mimics human cognitive mechanisms for understanding sarcasm, enabling robust multimodal feature learning to jointly optimize MSD and MuSE. Specifically, we first model MSD and MuSE from the perspective of structural causal models, establishing variational causal pathways to define the objectives for joint optimization. Next, we design an alignment-then-fusion approach to integrate multimodal features, providing robust fusion representations for sarcasm detection and explanation generation. Finally, we enhance the reasoning trustworthiness by ensuring consistency between detection results and explanations. Experimental results demonstrate the superiority of MuVaC in public datasets, offering a new perspective for understanding multimodal sarcasm.

Related papers

Disagreements in Reasoning: How a Model's Thinking Process Dictates Persuasion in Multi-Agent Systems [49.69773210844221]
This paper challenges the prevailing hypothesis that persuasive efficacy is primarily a function of model scale.<n>Through a series of multi-agent persuasion experiments, we uncover a fundamental trade-off we term the Persuasion Duality.<n>Our findings reveal that the reasoning process in LRMs exhibits significantly greater resistance to persuasion, maintaining their initial beliefs more robustly.
arXiv Detail & Related papers (2025-09-25T12:03:10Z)
Can Large Vision-Language Models Understand Multimodal Sarcasm? [14.863320201956963]
Sarcasm is a complex linguistic phenomenon that involves a disparity between literal and intended meanings.<n>We evaluate Large Visual Language Models (LVLMs) in Multimodal Sarcasm Analysis (MSA) tasks.<n>We propose a training-free framework that integrates in-depth object extraction and external conceptual knowledge.
arXiv Detail & Related papers (2025-08-05T17:05:11Z)
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models [10.47267683821842]
We propose an innovative multi-modal Commander-GPT framework for sarcasm detection.<n>Inspired by military strategy, we first decompose the sarcasm detection task into six distinct sub-tasks.<n>A central commander (decision-maker) then assigns the best-suited large language model to address each specific sub-task.<n>Our approach achieves state-of-the-art performance, with a 19.3% improvement in F1 score.
arXiv Detail & Related papers (2025-03-24T13:53:00Z)
Multi-View Incongruity Learning for Multimodal Sarcasm Detection [40.10921890527881]
Multimodal sarcasm detection (MSD) is essential for various downstream tasks.<n>Existing MSD methods tend to rely on spurious correlations.<n>This paper proposes a novel method that integrate Multimodal Incongruities via Contrastive Learning (MICL) for multimodal sarcasm detection.
arXiv Detail & Related papers (2024-12-01T10:29:36Z)
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA) To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z)
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models [14.453131020178564]
This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs.
arXiv Detail & Related papers (2024-05-01T08:44:44Z)
Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue [63.32199372362483]
We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.<n>In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.<n>We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
arXiv Detail & Related papers (2024-02-06T03:14:46Z)
MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System [57.650338588086186]
We introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD. We present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives.
arXiv Detail & Related papers (2023-07-14T03:22:51Z)
Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs. We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence. We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z)
Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism [7.194040730138362]
We construct a Contras-tive-Attention-based Sarcasm Detection (ConAttSD) model, which uses an inter-modality contrastive attention mechanism to extract contrastive features for an utterance. Our experiments on MUStARD, a benchmark multi-modal sarcasm dataset, demonstrate the effectiveness of the proposed ConAttSD model.
arXiv Detail & Related papers (2021-09-30T14:17:51Z)
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces. The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap. Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.