MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
- URL: http://arxiv.org/abs/2307.07135v1
- Date: Fri, 14 Jul 2023 03:22:51 GMT
- Title: MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
- Authors: Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin
Liang, Wanxiang Che and Ruifeng Xu
- Abstract summary: We introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD.
We present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives.
- Score: 57.650338588086186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal sarcasm detection has attracted much recent attention.
Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder
the development of reliable multi-modal sarcasm detection system: (1) There are
some spurious cues in MMSD, leading to the model bias learning; (2) The
negative samples in MMSD are not always reasonable. To solve the aforementioned
issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings
of MMSD, by removing the spurious cues and re-annotating the unreasonable
samples. Meanwhile, we present a novel framework called multi-view CLIP that is
capable of leveraging multi-grained cues from multiple perspectives (i.e.,
text, image, and text-image interaction view) for multi-modal sarcasm
detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for
building reliable multi-modal sarcasm detection systems and multi-view CLIP can
significantly outperform the previous best baselines.
Related papers
- Multi-View Incongruity Learning for Multimodal Sarcasm Detection [40.10921890527881]
Multimodal sarcasm detection (MSD) is essential for various downstream tasks.
Existing MSD methods tend to rely on spurious correlations.
This paper proposes a novel method that integrate Multimodal Incongruities via Contrastive Learning (MICL) for multimodal sarcasm detection.
arXiv Detail & Related papers (2024-12-01T10:29:36Z) - RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection [17.55808303452098]
Sarcasm in social media, often expressed through text-image combinations, poses challenges for sentiment analysis and intention mining.
We propose InterCLIP-MEP, which introduces Interactive CLIP with an efficient training strategy to extract enriched text-image representations.
We show that InterCLIP-MEP achieves state-of-the-art performance, with significant accuracy and F1 score improvements on MMSD and MMSD2.0.
arXiv Detail & Related papers (2024-06-24T09:13:42Z) - CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models [14.453131020178564]
This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge.
Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection.
We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs.
arXiv Detail & Related papers (2024-05-01T08:44:44Z) - Detecting Machine-Generated Texts by Multi-Population Aware Optimization
for Maximum Mean Discrepancy [47.382793714455445]
Machine-generated texts (MGTs) may carry critical risks, such as plagiarism, misleading information, or hallucination issues.
It is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle.
We propose a novel textitmulti-population aware optimization method for MMD called MMD-MP.
arXiv Detail & Related papers (2024-02-25T09:44:56Z) - Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs)
The performance of existing RS-CD methods is attributed to training on large annotated datasets.
This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z) - Multimodal Learning using Optimal Transport for Sarcasm and Humor
Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs.
We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence.
We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z) - Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism [7.194040730138362]
We construct a Contras-tive-Attention-based Sarcasm Detection (ConAttSD) model, which uses an inter-modality contrastive attention mechanism to extract contrastive features for an utterance.
Our experiments on MUStARD, a benchmark multi-modal sarcasm dataset, demonstrate the effectiveness of the proposed ConAttSD model.
arXiv Detail & Related papers (2021-09-30T14:17:51Z) - Multi-document Summarization with Maximal Marginal Relevance-guided
Reinforcement Learning [54.446686397551275]
We present RL-MMR, which unifies advanced neural SDS methods and statistical measures used in classical MDS.
RL-MMR casts MMR guidance on fewer promising candidates, which restrains the search space and thus leads to better representation learning.
arXiv Detail & Related papers (2020-09-30T21:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.