Related papers: MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

URL: http://arxiv.org/abs/2307.07135v1
Date: Fri, 14 Jul 2023 03:22:51 GMT
Title: MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Authors: Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che and Ruifeng Xu
Abstract summary: We introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD. We present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives.
Score: 57.650338588086186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples. Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines.

Related papers

Dual Modality-Aware Gated Prompt Tuning for Few-Shot Multimodal Sarcasm Detection [1.515687944002438]
We introduce DMDP (Deep Modality-Disentangled Prompt Tuning), a novel framework for few-shot multimodal sarcasm detection.<n>DMP employs gated, modality-specific deep prompts for text and visual encoders.<n>We incorporate a prompt-sharing mechanism across layers, allowing the model to aggregate both low-level and high-level semantic cues.
arXiv Detail & Related papers (2025-07-06T17:16:34Z)
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models [10.47267683821842]
We propose an innovative multi-modal Commander-GPT framework for sarcasm detection. Inspired by military strategy, we first decompose the sarcasm detection task into six distinct sub-tasks. A central commander (decision-maker) then assigns the best-suited large language model to address each specific sub-task. Our approach achieves state-of-the-art performance, with a 19.3% improvement in F1 score.
arXiv Detail & Related papers (2025-03-24T13:53:00Z)
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models [18.15726815994039]
We introduce an analytical framework using systematically designed prompts on existing multimodal sarcasm datasets. Our findings reveal notable discrepancies -- across LVLMs and within the same model under varied prompts. These results challenge binary labeling paradigms by highlighting sarcasm's subjectivity.
arXiv Detail & Related papers (2025-03-15T14:10:25Z)
Multi-View Incongruity Learning for Multimodal Sarcasm Detection [40.10921890527881]
Multimodal sarcasm detection (MSD) is essential for various downstream tasks. Existing MSD methods tend to rely on spurious correlations. This paper proposes a novel method that integrate Multimodal Incongruities via Contrastive Learning (MICL) for multimodal sarcasm detection.
arXiv Detail & Related papers (2024-12-01T10:29:36Z)
RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR. Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer. Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z)
InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection [10.736718868448175]
Existing multi-modal sarcasm detection methods have been proven to overestimate performance. We propose InterCLIP-MEP, a novel framework for multi-modal sarcasm detection. InterCLIP-MEP achieves state-of-the-art performance on the MMSD2.0 benchmark.
arXiv Detail & Related papers (2024-06-24T09:13:42Z)
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models [14.453131020178564]
This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs.
arXiv Detail & Related papers (2024-05-01T08:44:44Z)
An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders [3.1093882314734285]
Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. This paper introduces a simple and universal textbfMulti-textbfModal textbfSequential textbfRecommendation (textbfMMSR) framework.
arXiv Detail & Related papers (2024-03-26T04:16:57Z)
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy [47.382793714455445]
Machine-generated texts (MGTs) may carry critical risks, such as plagiarism, misleading information, or hallucination issues. It is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle. We propose a novel textitmulti-population aware optimization method for MMD called MMD-MP.
arXiv Detail & Related papers (2024-02-25T09:44:56Z)
Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs) The performance of existing RS-CD methods is attributed to training on large annotated datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z)
Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs. We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence. We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z)
Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism [7.194040730138362]
We construct a Contras-tive-Attention-based Sarcasm Detection (ConAttSD) model, which uses an inter-modality contrastive attention mechanism to extract contrastive features for an utterance. Our experiments on MUStARD, a benchmark multi-modal sarcasm dataset, demonstrate the effectiveness of the proposed ConAttSD model.
arXiv Detail & Related papers (2021-09-30T14:17:51Z)
Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning [54.446686397551275]
We present RL-MMR, which unifies advanced neural SDS methods and statistical measures used in classical MDS. RL-MMR casts MMR guidance on fewer promising candidates, which restrains the search space and thus leads to better representation learning.
arXiv Detail & Related papers (2020-09-30T21:50:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.