InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models
- URL: http://arxiv.org/abs/2309.11911v6
- Date: Thu, 29 Aug 2024 05:14:36 GMT
- Title: InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models
- Authors: Shanglin Lei, Guanting Dong, Xiaoping Wang, Keheng Wang, Runqi Qiao, Sirui Wang,
- Abstract summary: We propose a novel approach, InstructERC, to reformulate the emotion recognition task from a discriminative framework to a generative framework based on Large Language Models (LLMs)
InstructERC makes three significant contributions: (1) it introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information; (2) we introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations; and (3) Pioneeringly, we unify emotion labels across benchmarks through the feeling wheel to fit real application scenarios.
- Score: 9.611864685207056
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The field of emotion recognition of conversation (ERC) has been focusing on separating sentence feature encoding and context modeling, lacking exploration in generative paradigms based on unified designs. In this study, we propose a novel approach, InstructERC, to reformulate the ERC task from a discriminative framework to a generative framework based on Large Language Models (LLMs). InstructERC makes three significant contributions: (1) it introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information. (2) We introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations. (3) Pioneeringly, we unify emotion labels across benchmarks through the feeling wheel to fit real application scenarios. InstructERC still perform impressively on this unified dataset. Our LLM-based plugin framework significantly outperforms all previous models and achieves comprehensive SOTA on three commonly used ERC datasets. Extensive analysis of parameter-efficient and data-scaling experiments provides empirical guidance for applying it in practical scenarios.
Related papers
- ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - 'What are you referring to?' Evaluating the Ability of Multi-Modal
Dialogue Models to Process Clarificational Exchanges [65.03196674816772]
Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee.
Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarification Exchanges (CE): a Clarification Request (CR) and a response.
Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models.
arXiv Detail & Related papers (2023-07-28T13:44:33Z) - BERT-ERC: Fine-tuning BERT is Enough for Emotion Recognition in
Conversation [19.663265448700002]
Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm.
We propose a novel paradigm, i.e., exploring contextual information and dialogue structure information in the fine-tuning step.
We develop our model BERT-ERC according to the proposed paradigm, which improves ERC performance in three aspects.
arXiv Detail & Related papers (2023-01-17T08:03:32Z) - Deep Emotion Recognition in Textual Conversations: A Survey [0.8602553195689513]
New applications and implementation scenarios present novel challenges and opportunities.
These range from leveraging the conversational context, speaker, and emotion dynamics modelling, to interpreting common sense expressions.
This survey emphasizes the advantage of leveraging techniques to address unbalanced data.
arXiv Detail & Related papers (2022-11-16T19:42:31Z) - Learning Interpretable Latent Dialogue Actions With Less Supervision [3.42658286826597]
We present a novel architecture for explainable modeling of task-oriented dialogues with discrete latent variables.
Our model is based on variational recurrent neural networks (VRNN) and requires no explicit annotation of semantic information.
arXiv Detail & Related papers (2022-09-22T16:14:06Z) - DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for
Dialog Response Generation [80.45816053153722]
DialogVED introduces continuous latent variables into the enhanced encoder-decoder pre-training framework to increase the relevance and diversity of responses.
We conduct experiments on PersonaChat, DailyDialog, and DSTC7-AVSD benchmarks for response generation.
arXiv Detail & Related papers (2022-04-27T16:18:15Z) - Dialogue Meaning Representation for Task-Oriented Dialogue Systems [51.91615150842267]
We propose Dialogue Meaning Representation (DMR), a flexible and easily extendable representation for task-oriented dialogue.
Our representation contains a set of nodes and edges with inheritance hierarchy to represent rich semantics for compositional semantics and task-specific concepts.
We propose two evaluation tasks to evaluate different machine learning based dialogue models, and further propose a novel coreference resolution model GNNCoref for the graph-based coreference resolution task.
arXiv Detail & Related papers (2022-04-23T04:17:55Z) - DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act
Recognition and Sentiment Classification [77.59549450705384]
In dialog system, dialog act recognition and sentiment classification are two correlative tasks.
Most of the existing systems either treat them as separate tasks or just jointly model the two tasks.
We propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks.
arXiv Detail & Related papers (2020-08-16T14:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.