Related papers: Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

URL: http://arxiv.org/abs/2406.15000v1
Date: Fri, 21 Jun 2024 09:26:55 GMT
Title: Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan,
Abstract summary: This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension.
Score: 17.409790984399052
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.

Related papers

Exploring Personality-Aware Interactions in Salesperson Dialogue Agents [21.282523537612477]
This study explores the influence of user personas, defined using the Myers-Briggs Type Indicator (MBTI), on the interaction quality and performance of sales-oriented dialogue agents. Our findings reveal significant patterns in interaction dynamics, task completion rates, and dialogue naturalness, underscoring the future potential for dialogue agents to refine their strategies.
arXiv Detail & Related papers (2025-04-25T04:10:25Z)
Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines [9.834055425277874]
This study investigates learner-AI interactions through an educational experiment in which participants receive structured guidance on effective prompting. To assess user behavior and prompting efficacy, we analyze a dataset of 642 interactions from 107 users. Our findings provide a deeper understanding of how users engage with Large Language Models and the role of structured prompting guidance in enhancing AI-assisted communication.
arXiv Detail & Related papers (2025-04-10T15:20:43Z)
InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions [22.007942964950217]
We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses.
arXiv Detail & Related papers (2025-03-06T05:35:19Z)
A Survey on Multi-Turn Interaction Capabilities of Large Language Models [47.05742294162551]
Multi-turn interaction in the dialogue system research refers to a system's ability to maintain context across multiple dialogue turns. Recent advancements in large language models (LLMs) have significantly expanded the scope of multi-turn interaction.
arXiv Detail & Related papers (2025-01-17T05:21:49Z)
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion. In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation. We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z)
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models [25.070424546200293]
We present a novel approach leveraging the robust reasoning capabilities of large language models (LLMs) to generate precise dialogue-associated visual descriptors. Experiments conducted on benchmark data validate the effectiveness of our proposed approach in deriving concise and accurate visual descriptors. Our findings demonstrate the method's generalizability across diverse visual cues, various LLMs, and different datasets.
arXiv Detail & Related papers (2024-07-04T03:50:30Z)
Dataset and Models for Item Recommendation Using Multi-Modal User Interactions [14.054250597878465]
We study the case of multi-modal user interactions in a setting where users engage with a service provider through multiple channels. In such cases, incomplete modalities naturally occur, since not all users interact through all the available channels. We propose a novel approach that specifically deals with missing modalities by mapping user interactions to a common feature space.
arXiv Detail & Related papers (2024-05-07T12:03:22Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection. We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
Automatic Context-Driven Inference of Engagement in HMI: A Survey [6.479224589451863]
This paper presents a survey on engagement inference for human-machine interaction. It entails interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods. It serves as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability.
arXiv Detail & Related papers (2022-09-30T10:46:13Z)
Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue System [120.70726465994781]
multimodal spoken dialogue system enables telephonebased agents to interact with customers like human. We deploy Conversation Duplex Alibaba intelligent customer service to share lessons learned in production. Online A/B experiments show in proposed system can significantly reduce response latency by 50%.
arXiv Detail & Related papers (2022-05-30T12:41:23Z)
A Role-Selected Sharing Network for Joint Machine-Human Chatting Handoff and Service Satisfaction Analysis [35.937850808046456]
We propose a novel model, Role-Selected Sharing Network ( RSSN), which integrates dialogue satisfaction estimation and handoff prediction in one multi-task learning framework. Unlike prior efforts in dialog mining, by utilizing local user satisfaction as a bridge, global satisfaction detector and handoff predictor can effectively exchange critical information.
arXiv Detail & Related papers (2021-09-17T08:39:45Z)
You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation. Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.