RCLMuFN: Relational Context Learning and Multiplex Fusion Network for Multimodal Sarcasm Detection
- URL: http://arxiv.org/abs/2412.13008v1
- Date: Tue, 17 Dec 2024 15:29:31 GMT
- Title: RCLMuFN: Relational Context Learning and Multiplex Fusion Network for Multimodal Sarcasm Detection
- Authors: Tongguan Wang, Junkai Li, Guixin Su, Yongcheng Zhang, Dongyu Su, Yuxue Hu, Ying Sha,
- Abstract summary: We propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection.
Firstly, we employ four feature extractors to comprehensively extract features from raw text and images.
Secondly, we utilize the relational context learning module to learn the contextual information of text and images.
- Score: 1.023096557577223
- License:
- Abstract: Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker's true intent. Accurate detection of sarcasm aids in identifying and filtering undesirable information on the Internet, thereby reducing malicious defamation and rumor-mongering. Nonetheless, the task of automatic sarcasm detection remains highly challenging for machines, as it critically depends on intricate factors such as relational context. Most existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and images while neglecting to learn the relational context between text and images, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm changes with the evolution of different contexts, but existing methods may not be accurate in modeling such dynamic changes, limiting the generalization ability of the models. To address the above issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. Firstly, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Secondly, we utilize the relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we employ a multiplex feature fusion module to enhance the generalization of the model by penetratingly integrating multimodal features derived from various interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that our proposed method achieves state-of-the-art performance.
Related papers
- Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection [12.744170917349287]
This study presents a novel framework for multimodal sarcasm detection that can process input triplets.
The proposed model achieves the best accuracy of 92.89% and 64.48%, respectively, on the Twitter multimodal sarcasm and MultiBully datasets.
arXiv Detail & Related papers (2024-08-05T16:07:31Z) - VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering Visual, Acoustic and Glossary Features [13.922091192207718]
Sarcasm recognition aims to identify hidden sarcastic, criticizing, and metaphorical information embedded in everyday dialogue.
We propose a novel approach that combines a lightweight depth attention module with a self-regulated ConvNet to concentrate on the most crucial features of visual data.
We have also conducted a cross-dataset analysis to test the adaptability of VyAnG-Net with unseen samples of another dataset MUStARD++.
arXiv Detail & Related papers (2024-08-05T15:36:52Z) - Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue [63.32199372362483]
We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.
In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.
We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
arXiv Detail & Related papers (2024-02-06T03:14:46Z) - Image Matters: A New Dataset and Empirical Study for Multimodal
Hyperbole Detection [52.04083398850383]
We create a multimodal detection dataset from Weibo (a Chinese social media)
We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection.
Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance.
arXiv Detail & Related papers (2023-07-01T03:23:56Z) - Multi-source Semantic Graph-based Multimodal Sarcasm Explanation
Generation [53.97962603641629]
We propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM.
TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image.
TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source semantic relations.
arXiv Detail & Related papers (2023-06-29T03:26:10Z) - Sarcasm Detection Framework Using Emotion and Sentiment Features [62.997667081978825]
We propose a model which incorporates emotion and sentiment features to capture the incongruity intrinsic to sarcasm.
Our approach achieved state-of-the-art results on four datasets from social networking platforms and online media.
arXiv Detail & Related papers (2022-11-23T15:14:44Z) - How to Describe Images in a More Funny Way? Towards a Modular Approach
to Cross-Modal Sarcasm Generation [62.89586083449108]
We study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image.
CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities.
We propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation.
arXiv Detail & Related papers (2022-11-20T14:38:24Z) - Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity
Modeling with Knowledge Enhancement [31.97249246223621]
Sarcasm is a linguistic phenomenon indicating a discrepancy between literal meanings and implied intentions.
Most existing techniques only modeled the atomic-level inconsistencies between the text input and its accompanying image.
We propose a novel hierarchical framework for sarcasm detection by exploring both the atomic-level congruity based on multi-head cross attention mechanism and the composition-level congruity based on graph neural networks.
arXiv Detail & Related papers (2022-10-07T12:44:33Z) - Multimodal Learning using Optimal Transport for Sarcasm and Humor
Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs.
We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence.
We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z) - Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism [7.194040730138362]
We construct a Contras-tive-Attention-based Sarcasm Detection (ConAttSD) model, which uses an inter-modality contrastive attention mechanism to extract contrastive features for an utterance.
Our experiments on MUStARD, a benchmark multi-modal sarcasm dataset, demonstrate the effectiveness of the proposed ConAttSD model.
arXiv Detail & Related papers (2021-09-30T14:17:51Z) - Interpretable Multi-Head Self-Attention model for Sarcasm Detection in
social media [0.0]
Inherent ambiguity in sarcastic expressions, make sarcasm detection very difficult.
We develop an interpretable deep learning model using multi-head self-attention and gated recurrent units.
We show the effectiveness of our approach by achieving state-of-the-art results on multiple datasets.
arXiv Detail & Related papers (2021-01-14T21:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.