Real-time Caller Intent Detection In Human-Human Customer Support Spoken
Conversations
- URL: http://arxiv.org/abs/2208.06802v1
- Date: Sun, 14 Aug 2022 07:50:23 GMT
- Title: Real-time Caller Intent Detection In Human-Human Customer Support Spoken
Conversations
- Authors: Mrinal Rawat, Victor Barres
- Abstract summary: Agent assistance during human-human customer support spoken interactions requires triggering based on the caller's intent (reason for call)
The goal is for a system to detect the caller's intent at the time the agent would have been able to detect it (Intent Boundary)
Recent work on voice assistants has used incremental real-time predictions at a word-by-word level to detect intent before the end of a command.
- Score: 10.312382727352823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Agent assistance during human-human customer support spoken interactions
requires triggering workflows based on the caller's intent (reason for call).
Timeliness of prediction is essential for a good user experience. The goal is
for a system to detect the caller's intent at the time the agent would have
been able to detect it (Intent Boundary). Some approaches focus on predicting
the output offline, i.e. once the full spoken input (e.g. the whole
conversational turn) has been processed by the ASR system. This introduces an
undesirable latency in the prediction each time the intent could have been
detected earlier in the turn. Recent work on voice assistants has used
incremental real-time predictions at a word-by-word level to detect intent
before the end of a command. Human-directed and machine-directed speech however
have very different characteristics. In this work, we propose to apply a method
developed in the context of voice-assistant to the problem of online real time
caller's intent detection in human-human spoken interactions. We use a dual
architecture in which two LSTMs are jointly trained: one predicting the Intent
Boundary (IB) and then other predicting the intent class at the IB. We conduct
our experiments on our private dataset comprising transcripts of human-human
telephone conversations from the telecom customer support domain. We report
results analyzing both the accuracy of our system as well as the impact of
different architectures on the trade off between overall accuracy and
prediction latency.
Related papers
- Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - Look Hear: Gaze Prediction for Speech-directed Human Attention [49.81718760025951]
Our study focuses on the incremental prediction of attention as a person is seeing an image and hearing a referring expression.
We developed the Attention in Referral Transformer model or ART, which predicts the human fixations spurred by each word in a referring expression.
In our quantitative and qualitative analyses, ART not only outperforms existing methods in scanpath prediction, but also appears to capture several human attention patterns.
arXiv Detail & Related papers (2024-07-28T22:35:08Z) - Personalized Predictive ASR for Latency Reduction in Voice Assistants [29.237198363254752]
We introduce predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance.
We evaluate our methods on an internal voice assistant dataset as well as the public SLURP dataset.
arXiv Detail & Related papers (2023-05-23T08:05:43Z) - The Conversational Short-phrase Speaker Diarization (CSSD) Task:
Dataset, Evaluation Metric and Baselines [63.86406909879314]
This paper describes the Conversational Short-phrases Speaker Diarization (CSSD) task.
It consists of training and testing datasets, evaluation metric and baselines.
In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.
arXiv Detail & Related papers (2022-08-17T03:26:23Z) - Improved Goal Oriented Dialogue via Utterance Generation and Look Ahead [5.062869359266078]
intent prediction can be improved by training a deep text-to-text neural model to generate successive user utterances from unlabeled dialogue data.
We present a novel look-ahead approach that uses user utterance generation to improve intent prediction in time.
arXiv Detail & Related papers (2021-10-24T11:12:48Z) - Detecting Speaker Personas from Conversational Texts [52.4557098875992]
We study a new task, named Speaker Persona Detection (SPD), which aims to detect speaker personas based on the plain conversational text.
We build a dataset for SPD, dubbed as Persona Match on Persona-Chat (PMPC)
We evaluate several baseline models and propose utterance-to-profile (U2P) matching networks for this task.
arXiv Detail & Related papers (2021-09-03T06:14:38Z) - Intelligent Conversational Android ERICA Applied to Attentive Listening
and Job Interview [41.789773897391605]
We have developed an intelligent conversational android ERICA.
We set up several social interaction tasks for ERICA, including attentive listening, job interview, and speed dating.
It has been evaluated with 40 senior people, engaged in conversation of 5-7 minutes without a conversation breakdown.
arXiv Detail & Related papers (2021-05-02T06:37:23Z) - Stop Bugging Me! Evading Modern-Day Wiretapping Using Adversarial
Perturbations [47.32228513808444]
Mass surveillance systems for voice over IP (VoIP) conversations pose a great risk to privacy.
We present an adversarial-learning-based framework for privacy protection for VoIP conversations.
arXiv Detail & Related papers (2020-10-24T06:56:35Z) - Predict-then-Decide: A Predictive Approach for Wait or Answer Task in
Dialogue Systems [24.560203199376478]
We propose a predictive approach named Predict-then-Decide (PTD) to tackle this Wait-or-Answer problem.
We conduct experiments on two real-life scenarios and three public datasets.
arXiv Detail & Related papers (2020-05-27T01:48:54Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.