STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
- URL: http://arxiv.org/abs/2402.01679v2
- Date: Fri, 16 Feb 2024 11:27:14 GMT
- Title: STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
- Authors: Yiqun Zhang, Fanheng Kong, Peidong Wang, Shuang Sun, Lingshuai Wang,
Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song
- Abstract summary: We introduce the Agent for STICKERCONV (Agent4SC), which uses collaborative agent interactions to realistically simulate human behavior with sticker usage.
We develop a multimodal empathetic dialogue dataset, STICKERCONV, comprising 12.9K dialogue sessions, 5.8K unique stickers, and 2K diverse conversational scenarios.
To advance further, we propose PErceive and Generate Stickers (PEGS), a multimodal empathetic response generation framework.
- Score: 23.733723334721695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stickers, while widely recognized for enhancing empathetic communication in
online interactions, remain underexplored in current empathetic dialogue
research, notably due to the challenge of a lack of comprehensive datasets. In
this paper, we introduce the Agent for STICKERCONV (Agent4SC), which uses
collaborative agent interactions to realistically simulate human behavior with
sticker usage, thereby enhancing multimodal empathetic communication. Building
on this foundation, we develop a multimodal empathetic dialogue dataset,
STICKERCONV, comprising 12.9K dialogue sessions, 5.8K unique stickers, and 2K
diverse conversational scenarios. This dataset serves as a benchmark for
multimodal empathetic generation. To advance further, we propose PErceive and
Generate Stickers (PEGS), a multimodal empathetic response generation
framework, complemented by a comprehensive set of empathy evaluation metrics
based on LLM. Our experiments demonstrate PEGS's effectiveness in generating
contextually relevant and emotionally resonant multimodal empathetic responses,
contributing to the advancement of more nuanced and engaging empathetic
dialogue systems.
Related papers
- DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue [17.397151329196955]
We propose DialogueAgents, a novel hybrid agent-based speech synthesis framework.
We contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset.
arXiv Detail & Related papers (2025-04-20T04:14:30Z) - REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues.
We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues.
Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z) - SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent [27.301608019492043]
Large Language Models (LLMs) have demonstrated promising potential in providing empathetic support during interactions.
We propose an innovative strategy-enhanced role-playing framework, designed to simulate authentic emotional support conversations.
Within this framework, we develop the textbfServeForEmo dataset, comprising an extensive collection of 3.7K+ multi-turn dialogues and 62.8K+ utterances.
arXiv Detail & Related papers (2024-12-11T13:56:04Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - Empathy Through Multimodality in Conversational Interfaces [1.360649555639909]
Conversational Health Agents (CHAs) are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence.
This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support.
It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses.
arXiv Detail & Related papers (2024-05-08T02:48:29Z) - AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in
Group Conversations [39.79734528362605]
Multimodal Attention Network captures cross-modal interactions at various levels of spatial abstraction.
AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level.
arXiv Detail & Related papers (2024-01-26T19:17:05Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Harnessing Large Language Models' Empathetic Response Generation
Capabilities for Online Mental Health Counselling Support [1.9336815376402723]
Large Language Models (LLMs) have demonstrated remarkable performance across various information-seeking and reasoning tasks.
This study sought to examine LLMs' capability to generate empathetic responses in conversations that emulate those in a mental health counselling setting.
We selected five LLMs: version 3.5 and version 4 of the Generative Pre-training (GPT), Vicuna FastChat-T5, Pathways Language Model (PaLM) version 2, and Falcon-7B-Instruct.
arXiv Detail & Related papers (2023-10-12T03:33:06Z) - Multimodal Prompt Transformer with Hybrid Contrastive Learning for
Emotion Recognition in Conversation [9.817888267356716]
multimodal Emotion Recognition in Conversation (ERC) faces two problems.
Deep emotion cues extraction was performed on modalities with strong representation ability.
Feature filters were designed as multimodal prompt information for modalities with weak representation ability.
MPT embeds multimodal fusion information into each attention layer of the Transformer.
arXiv Detail & Related papers (2023-10-04T13:54:46Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Selecting Stickers in Open-Domain Dialogue through Multitask Learning [51.67855506570727]
We propose a multitask learning method comprised of three auxiliary tasks to enhance the understanding of dialogue history, emotion and semantic meaning of stickers.
Our model can better combine the multimodal information and achieve significantly higher accuracy over strong baselines.
arXiv Detail & Related papers (2022-09-16T03:45:22Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Learning to Respond with Stickers: A Framework of Unifying
Multi-Modality in Multi-Turn Dialog [65.7021675527543]
Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels.
arXiv Detail & Related papers (2020-03-10T13:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.