Related papers: In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning

In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning

URL: http://arxiv.org/abs/2508.11889v1
Date: Sat, 16 Aug 2025 03:23:48 GMT
Title: In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning
Authors: Hui Ma, Bo Zhang, Jinpeng Hu, Zenglin Shi,
Abstract summary: Emotion recognition in conversation (ERC) aims to identify the emotion of each utterance in a conversation.<n>We propose InitERC, a simple yet effective one-stage in-context instruction tuning framework for ERC.<n>InitERC adapts LLMs to learn speaker-context-emotion alignment from context examples via in-context instruction tuning.
Score: 15.153136138757887
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emotion recognition in conversation (ERC) aims to identify the emotion of each utterance in a conversation, playing a vital role in empathetic artificial intelligence. With the growing of large language models (LLMs), instruction tuning has emerged as a critical paradigm for ERC. Existing studies mainly focus on multi-stage instruction tuning, which first endows LLMs with speaker characteristics, and then conducts context-aware instruction tuning to comprehend emotional states. However, these methods inherently constrains the capacity to jointly capture the dynamic interaction between speaker characteristics and conversational context, resulting in weak alignment among speaker identity, contextual cues, and emotion states within a unified framework. In this paper, we propose InitERC, a simple yet effective one-stage in-context instruction tuning framework for ERC. InitERC adapts LLMs to learn speaker-context-emotion alignment from context examples via in-context instruction tuning. Specifically, InitERC comprises four components, i.e., demonstration pool construction, in-context example selection, prompt template design, and in-context instruction tuning. To explore the impact of in-context examples, we conduct a comprehensive study on three key factors: retrieval strategy, example ordering, and the number of examples. Extensive experiments on three widely used datasets demonstrate that our proposed InitERC achieves substantial improvements over the state-of-the-art baselines.

Related papers

Covo-Audio Technical Report [61.09708870154148]
Covo-Audio, a 7B-end LALM, directly processes continuous audio inputs and generates audio outputs within a single unified architecture.<n>Covo-Audio-Chat, a dialogue-oriented variant, demonstrates semantic strong spoken conversational abilities.
arXiv Detail & Related papers (2026-02-10T14:31:11Z)
Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning [16.195689085967004]
Emotion Recognition in Conversation (ERC) is a crucial task for understanding human emotions and enabling natural human-computer interaction.<n>We propose a novel ERC training framework, PRC-Emo, which integrates Prompt engineering, demonstration Retrieval, and Curriculum learning.<n>We show that our method achieves new state-of-the-art (SOTA) performance, demonstrating the effectiveness and generalizability of our approach.
arXiv Detail & Related papers (2025-11-10T12:52:11Z)
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio [52.859261069569165]
We propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation.<n>We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or better than state-of-the-art models specialized for individual tasks.
arXiv Detail & Related papers (2025-08-28T06:51:42Z)
Advancing Multi-Party Dialogue Framework with Speaker-ware Contrastive Learning [10.678477576849579]
We propose Contrastive learning-based Multi-party dialogue Response generation framework.<n>CMR employs a two-stage self-supervised contrastive learning framework.<n> Experimental results demonstrate that CMR not only significantly outperforms state-of-the-art models, but also generalizes well to large pre-trained language models.
arXiv Detail & Related papers (2025-01-20T06:28:22Z)
Unsupervised Mutual Learning of Discourse Parsing and Topic Segmentation in Dialogue [37.618612723025784]
In dialogue systems, discourse plays a crucial role in managing conversational focus and coordinating interactions.<n>It consists of two key structures: rhetorical structure and topic structure.<n>We introduce a unified representation that integrates rhetorical and topic structures, ensuring semantic consistency between them.<n>We propose an unsupervised mutual learning framework (UMLF) that jointly models rhetorical and topic structures, allowing them to mutually reinforce each other without requiring additional annotations.
arXiv Detail & Related papers (2024-05-30T08:10:50Z)
Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics. We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context. Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z)
Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs. We employ domain-adaptive training strategies to help the model adapt to the dialogue domains. Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z)
Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction [10.381257436462116]
We introduce SOLS, a novel model which can explicitly induce speaker-oriented latent structures for better DiaRE. Specifically, we learn latent structures to capture the relationships among tokens beyond the utterance boundaries. During the learning process, our speaker-specific regularization method progressively highlights speaker-related key clues and erases the irrelevant ones.
arXiv Detail & Related papers (2021-09-11T04:24:51Z)
Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)
Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning [50.5572111079898]
Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks.
arXiv Detail & Related papers (2020-02-27T04:36:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.