CKERC : Joint Large Language Models with Commonsense Knowledge for
Emotion Recognition in Conversation
- URL: http://arxiv.org/abs/2403.07260v1
- Date: Tue, 12 Mar 2024 02:37:11 GMT
- Title: CKERC : Joint Large Language Models with Commonsense Knowledge for
Emotion Recognition in Conversation
- Authors: Yumeng Fu
- Abstract summary: Emotion recognition in conversation (ERC) is a task which predicts the emotion of an utterance in the context of a conversation.
We propose a novel joint large language models with commonsense knowledge framework for emotion recognition in conversation, namely CKERC.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion recognition in conversation (ERC) is a task which predicts the
emotion of an utterance in the context of a conversation. It tightly depends on
dialogue context, speaker identity information, multiparty dialogue scenario
and so on. However, the state-of-the-art method (instructERC) solely
identifying speaker, and ignores commonsense knowledge(i.e., reaction of the
listeners and intention of the speaker, etc.) behind speakers during a
conversation, which can deeply mine speaker information. To this end, we
propose a novel joint large language models with commonsense knowledge
framework for emotion recognition in conversation, namely CKERC.We design
prompts to generate interlocutors' commonsense based on historical utterances
with large language model. And we use the interlocutor commonsense
identification task for LLM pre-training to fine-tune speaker implicit clues
information.By solving above challenge, our method achieve state-of-the-art.We
extensive experiment on three widely-used datasets, i.e., IEMOCAP, MELD,
EmoryNLP, demonstrate our method superiority. Also, we conduct in-depth
analysis and further demonstrate the effectiveness of commonsense knowledge in
ERC task in large language model.
Related papers
- WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - Multiscale Contextual Learning for Speech Emotion Recognition in
Emergency Call Center Conversations [4.297070083645049]
This paper presents a multi-scale conversational context learning approach for speech emotion recognition.
We investigated this approach on both speech transcriptions and acoustic segments.
According to our tests, the context derived from previous tokens has a more significant influence on accurate prediction than the following tokens.
arXiv Detail & Related papers (2023-08-28T20:31:45Z) - Context-Dependent Embedding Utterance Representations for Emotion
Recognition in Conversations [1.8126187844654875]
We approach Emotion Recognition in Conversations leveraging the conversational context.
We propose context-dependent embedding representations of each utterance.
The effectiveness of our approach is validated on the open-domain DailyDialog dataset and on the task-oriented EmoWOZ dataset.
arXiv Detail & Related papers (2023-04-17T12:37:57Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - KPT: Keyword-guided Pre-training for Grounded Dialog Generation [82.68787152707455]
We propose KPT (guided Pre-Training), a novel self-supervised pre-training method for grounded dialog generation.
Specifically, we use a pre-trained language model to extract the most uncertain tokens in the dialog as keywords.
We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages.
arXiv Detail & Related papers (2022-12-04T04:05:01Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Multi-turn Dialogue Reading Comprehension with Pivot Turns and Knowledge [43.352833140317486]
Multi-turn dialogue reading comprehension aims to teach machines to read dialogue contexts and solve tasks such as response selection and answering questions.
This work makes the first attempt to tackle the above two challenges by extracting substantially important turns as pivot utterances.
We propose a pivot-oriented deep selection model (PoDS) on top of the Transformer-based language models for dialogue comprehension.
arXiv Detail & Related papers (2021-02-10T15:00:12Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z) - TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented
Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling.
To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z) - Multi-Task Learning with Auxiliary Speaker Identification for
Conversational Emotion Recognition [32.439818455554885]
We exploit speaker identification (SI) as an auxiliary task to enhance the utterance representation in conversations.
By this method, we can learn better speaker-aware contextual representations from the additional SI corpus.
Experiments on two benchmark datasets demonstrate that the proposed architecture is highly effective for CER.
arXiv Detail & Related papers (2020-03-03T12:25:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.