Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
- URL: http://arxiv.org/abs/2402.03658v2
- Date: Mon, 06 Jan 2025 06:01:17 GMT
- Title: Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue
- Authors: Kun Ouyang, Liqiang Jing, Xuemeng Song, Meng Liu, Yupeng Hu, Liqiang Nie,
- Abstract summary: We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.
In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.
We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
- Score: 63.32199372362483
- License:
- Abstract: Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (\ie utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which play important roles in reflecting sarcasm that essentially involves subtle sentiment contrasts. Nevertheless, it is non-trivial to incorporate sentiments for boosting SED performance, due to three main challenges: 1) diverse effects of utterance tokens on sentiments; 2) gap between video-audio sentiment signals and the embedding space of BART; and 3) various relations among utterances, utterance sentiments, and video-audio sentiments. To tackle these challenges, we propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE. In particular, we first propose a lexicon-guided utterance sentiment inference module, where a heuristic utterance sentiment refinement strategy is devised. We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip. Thereafter, we devise a context-sentiment graph to comprehensively model the semantic relations among the utterances, utterance sentiments, and video-audio sentiments, to facilitate sarcasm explanation generation. Extensive experiments on the publicly released dataset WITS verify the superiority of our model over cutting-edge methods.
Related papers
- Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data [33.85748258158527]
Empathetic dialogue is crucial for natural human-computer interaction.
Large language models (LLMs) have revolutionized dialogue generation by harnessing their powerful capabilities.
We propose a novel approach that circumvents the need for question-answering data.
arXiv Detail & Related papers (2025-01-19T04:10:53Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.
Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.
We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Effect of Attention and Self-Supervised Speech Embeddings on
Non-Semantic Speech Tasks [3.570593982494095]
We look at speech emotion understanding as a perception task which is a more realistic setting.
We leverage ComParE rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion.
Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.
arXiv Detail & Related papers (2023-08-28T07:11:27Z) - SAIDS: A Novel Approach for Sentiment Analysis Informed of Dialect and
Sarcasm [0.0]
This paper introduces a novel system (SAIDS) that predicts the sentiment, sarcasm and dialect of Arabic tweets.
By training all tasks together, SAIDS results of 75.98 FPN, 59.09 F1-score and 71.13 F1-score for sentiment analysis, sarcasm detection, and dialect identification respectively.
arXiv Detail & Related papers (2023-01-06T14:19:46Z) - Explaining (Sarcastic) Utterances to Enhance Affect Understanding in
Multimodal Dialogues [40.80696210030204]
We propose MOSES, a deep neural network, which takes a multimodal (sarcastic) dialogue instance as an input and generates a natural language sentence as its explanation.
We leverage the generated explanation for various natural language understanding tasks in a conversational dialogue setup, such as sarcasm detection, humour identification, and emotion recognition.
Our evaluation shows that MOSES outperforms the state-of-the-art system for SED by an average of 2% on different evaluation metrics.
arXiv Detail & Related papers (2022-11-20T18:05:43Z) - Object Relational Graph with Teacher-Recommended Learning for Video
Captioning [92.48299156867664]
We propose a complete video captioning system including both a novel model and an effective training strategy.
Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation.
Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
arXiv Detail & Related papers (2020-02-26T15:34:52Z) - A Deep Neural Framework for Contextual Affect Detection [51.378225388679425]
A short and simple text carrying no emotion can represent some strong emotions when reading along with its context.
We propose a Contextual Affect Detection framework which learns the inter-dependence of words in a sentence.
arXiv Detail & Related papers (2020-01-28T05:03:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.