Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue
System
- URL: http://arxiv.org/abs/2205.15060v2
- Date: Tue, 31 May 2022 14:15:58 GMT
- Title: Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue
System
- Authors: Ting-En Lin, Yuchuan Wu, Fei Huang, Luo Si, Jian Sun, Yongbin Li
- Abstract summary: multimodal spoken dialogue system enables telephonebased agents to interact with customers like human.
We deploy Conversation Duplex Alibaba intelligent customer service to share lessons learned in production.
Online A/B experiments show in proposed system can significantly reduce response latency by 50%.
- Score: 120.70726465994781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present Duplex Conversation, a multi-turn, multimodal
spoken dialogue system that enables telephone-based agents to interact with
customers like a human. We use the concept of full-duplex in telecommunication
to demonstrate what a human-like interactive experience should be and how to
achieve smooth turn-taking through three subtasks: user state detection,
backchannel selection, and barge-in detection. Besides, we propose
semi-supervised learning with multimodal data augmentation to leverage
unlabeled data to increase model generalization. Experimental results on three
sub-tasks show that the proposed method achieves consistent improvements
compared with baselines. We deploy the Duplex Conversation to Alibaba
intelligent customer service and share lessons learned in production. Online
A/B experiments show that the proposed system can significantly reduce response
latency by 50%.
Related papers
- OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [24.68804661538364]
Full spoken dialogue systems significantly mirror human-human interactions.
achieving low latency and natural interactions is a significant challenge.
End-to-end full-to-end spoken dialogue systems are a promising direction for developing efficient and natural end-to-end systems.
Audio samples of dialogues generated by OmniFlatten can be found at this web site.
arXiv Detail & Related papers (2024-10-23T11:58:58Z) - Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations [17.409790984399052]
This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement.
Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues.
Results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension.
arXiv Detail & Related papers (2024-06-21T09:26:55Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - Mitigating Negative Style Transfer in Hybrid Dialogue System [42.65754135759929]
Hybrid dialogue systems that accomplish user-specific goals and participate in open-topic chitchat with users are attracting growing attention.
Existing research learns both tasks concurrently utilizing a multi-task fusion technique but ignores the negative transfer phenomenon induced by the unique textual style differences.
We devise supervised and self-supervised positive and negative sample constructions for diverse datasets.
arXiv Detail & Related papers (2022-12-14T12:13:34Z) - Smoothing Dialogue States for Open Conversational Machine Reading [70.83783364292438]
We propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation.
Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results.
arXiv Detail & Related papers (2021-08-28T08:04:28Z) - WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
for Multi-turn Dialogue [17.663449579168297]
We simulate a dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other.
The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses.
Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation.
arXiv Detail & Related papers (2021-08-01T08:00:45Z) - Transferable Dialogue Systems and User Simulators [17.106518400787156]
One of the difficulties in training dialogue systems is the lack of training data.
We explore the possibility of creating dialogue data through the interaction between a dialogue system and a user simulator.
We develop a modelling framework that can incorporate new dialogue scenarios through self-play between the two agents.
arXiv Detail & Related papers (2021-07-25T22:59:09Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z) - DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act
Recognition and Sentiment Classification [77.59549450705384]
In dialog system, dialog act recognition and sentiment classification are two correlative tasks.
Most of the existing systems either treat them as separate tasks or just jointly model the two tasks.
We propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks.
arXiv Detail & Related papers (2020-08-16T14:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.