Explaining (Sarcastic) Utterances to Enhance Affect Understanding in
Multimodal Dialogues
- URL: http://arxiv.org/abs/2211.11049v2
- Date: Tue, 22 Nov 2022 13:01:53 GMT
- Title: Explaining (Sarcastic) Utterances to Enhance Affect Understanding in
Multimodal Dialogues
- Authors: Shivani Kumar, Ishani Mondal, Md Shad Akhtar, Tanmoy Chakraborty
- Abstract summary: We propose MOSES, a deep neural network, which takes a multimodal (sarcastic) dialogue instance as an input and generates a natural language sentence as its explanation.
We leverage the generated explanation for various natural language understanding tasks in a conversational dialogue setup, such as sarcasm detection, humour identification, and emotion recognition.
Our evaluation shows that MOSES outperforms the state-of-the-art system for SED by an average of 2% on different evaluation metrics.
- Score: 40.80696210030204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversations emerge as the primary media for exchanging ideas and
conceptions. From the listener's perspective, identifying various affective
qualities, such as sarcasm, humour, and emotions, is paramount for
comprehending the true connotation of the emitted utterance. However, one of
the major hurdles faced in learning these affect dimensions is the presence of
figurative language, viz. irony, metaphor, or sarcasm. We hypothesize that any
detection system constituting the exhaustive and explicit presentation of the
emitted utterance would improve the overall comprehension of the dialogue. To
this end, we explore the task of Sarcasm Explanation in Dialogues, which aims
to unfold the hidden irony behind sarcastic utterances. We propose MOSES, a
deep neural network, which takes a multimodal (sarcastic) dialogue instance as
an input and generates a natural language sentence as its explanation.
Subsequently, we leverage the generated explanation for various natural
language understanding tasks in a conversational dialogue setup, such as
sarcasm detection, humour identification, and emotion recognition. Our
evaluation shows that MOSES outperforms the state-of-the-art system for SED by
an average of ~2% on different evaluation metrics, such as ROUGE, BLEU, and
METEOR. Further, we observe that leveraging the generated explanation advances
three downstream tasks for affect classification - an average improvement of
~14% F1-score in the sarcasm detection task and ~2% in the humour
identification and emotion recognition task. We also perform extensive analyses
to assess the quality of the results.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations [13.586958232275501]
We present a generalizable classification approach that leverages Large Language Models (LLMs)
We design a multi-faceted prompt to extract a textual explanation that connects visible cues to underlying social meanings.
Our findings hold true for in-domain classification, zero-shot, and few-shot domain transfer for two different social meaning detection tasks.
arXiv Detail & Related papers (2024-06-27T21:47:42Z) - Reasoning in Conversation: Solving Subjective Tasks through Dialogue
Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.
The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales.
We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z) - Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue [67.09698638709065]
We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.
In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.
We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
arXiv Detail & Related papers (2024-02-06T03:14:46Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations.
We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - When did you become so smart, oh wise one?! Sarcasm Explanation in
Multi-modal Multi-party Dialogues [27.884015521888458]
We study the discourse structure of sarcastic conversations and propose a novel task - Sarcasm Explanation in Dialogue (SED)
SED aims to generate natural language explanations of satirical conversations.
We propose MAF, a multimodal context-aware attention and global information fusion module to capture multimodality and use it to benchmark WITS.
arXiv Detail & Related papers (2022-03-12T12:16:07Z) - Multi-modal Sarcasm Detection and Humor Classification in Code-mixed
Conversations [14.852199996061287]
We develop a Hindi-English code-mixed dataset, MaSaC, for the multi-modal sarcasm detection and humor classification in conversational dialog.
We propose MSH-COMICS, a novel attention-rich neural architecture for the utterance classification.
arXiv Detail & Related papers (2021-05-20T18:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.