MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
- URL: http://arxiv.org/abs/2307.07135v1
- Date: Fri, 14 Jul 2023 03:22:51 GMT
- Title: MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
- Authors: Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin
Liang, Wanxiang Che and Ruifeng Xu
- Abstract summary: We introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD.
We present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives.
- Score: 57.650338588086186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal sarcasm detection has attracted much recent attention.
Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder
the development of reliable multi-modal sarcasm detection system: (1) There are
some spurious cues in MMSD, leading to the model bias learning; (2) The
negative samples in MMSD are not always reasonable. To solve the aforementioned
issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings
of MMSD, by removing the spurious cues and re-annotating the unreasonable
samples. Meanwhile, we present a novel framework called multi-view CLIP that is
capable of leveraging multi-grained cues from multiple perspectives (i.e.,
text, image, and text-image interaction view) for multi-modal sarcasm
detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for
building reliable multi-modal sarcasm detection systems and multi-view CLIP can
significantly outperform the previous best baselines.
Related papers
- A Survey of Multimodal Sarcasm Detection [32.659528422756416]
Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance.
We present the first comprehensive survey on multimodal sarcasm detection to date.
arXiv Detail & Related papers (2024-10-24T16:17:47Z) - RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection [10.736718868448175]
Existing multi-modal sarcasm detection methods have been proven to overestimate performance.
We propose InterCLIP-MEP, a novel framework for multi-modal sarcasm detection.
InterCLIP-MEP achieves state-of-the-art performance on the MMSD2.0 benchmark.
arXiv Detail & Related papers (2024-06-24T09:13:42Z) - CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models [14.453131020178564]
This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge.
Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection.
We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs.
arXiv Detail & Related papers (2024-05-01T08:44:44Z) - Detecting Machine-Generated Texts by Multi-Population Aware Optimization
for Maximum Mean Discrepancy [47.382793714455445]
Machine-generated texts (MGTs) may carry critical risks, such as plagiarism, misleading information, or hallucination issues.
It is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle.
We propose a novel textitmulti-population aware optimization method for MMD called MMD-MP.
arXiv Detail & Related papers (2024-02-25T09:44:56Z) - Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs)
The performance of existing RS-CD methods is attributed to training on large annotated datasets.
This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z) - Multimodal Learning using Optimal Transport for Sarcasm and Humor
Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs.
We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence.
We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z) - Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism [7.194040730138362]
We construct a Contras-tive-Attention-based Sarcasm Detection (ConAttSD) model, which uses an inter-modality contrastive attention mechanism to extract contrastive features for an utterance.
Our experiments on MUStARD, a benchmark multi-modal sarcasm dataset, demonstrate the effectiveness of the proposed ConAttSD model.
arXiv Detail & Related papers (2021-09-30T14:17:51Z) - Multi-document Summarization with Maximal Marginal Relevance-guided
Reinforcement Learning [54.446686397551275]
We present RL-MMR, which unifies advanced neural SDS methods and statistical measures used in classical MDS.
RL-MMR casts MMR guidance on fewer promising candidates, which restrains the search space and thus leads to better representation learning.
arXiv Detail & Related papers (2020-09-30T21:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.