Related papers: Learning to Rank Intents in Voice Assistants

Learning to Rank Intents in Voice Assistants

URL: http://arxiv.org/abs/2005.00119v2
Date: Mon, 4 May 2020 03:19:07 GMT
Title: Learning to Rank Intents in Voice Assistants
Authors: Raviteja Anantha, Srinivas Chappidi, and William Dawoodi
Abstract summary: We propose a novel Energy-based model for the intent ranking task. We show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%. We also evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.
Score: 2.102846336724103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Voice Assistants aim to fulfill user requests by choosing the best intent from multiple options generated by its Automated Speech Recognition and Natural Language Understanding sub-systems. However, voice assistants do not always produce the expected results. This can happen because voice assistants choose from ambiguous intents - user-specific or domain-specific contextual information reduces the ambiguity of the user request. Additionally the user information-state can be leveraged to understand how relevant/executable a specific intent is for a user request. In this work, we propose a novel Energy-based model for the intent ranking task, where we learn an affinity metric and model the trade-off between extracted meaning from speech utterances and relevance/executability aspects of the intent. Furthermore we present a Multisource Denoising Autoencoder based pretraining that is capable of learning fused representations of data from multiple sources. We empirically show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%, which in turn reduces ambiguity and eliminates undesired dead-ends leading to better user experience. Finally, we evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.

Related papers

Intent Representation Learning with Large Language Model for Recommendation [11.118517297006894]
We propose a model-agnostic framework, Intent Representation Learning with Large Language Model (IRLLRec), to construct multimodal intents and enhance recommendations. Specifically, IRLLRec employs a dual-tower architecture to learn multimodal intent representations. To better match textual and interaction-based intents, we employ momentum distillation to perform teacher-student learning on fused intent representations.
arXiv Detail & Related papers (2025-02-05T16:08:05Z)
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization [61.60501633397704]
We investigate the emergent abilities of the recently proposed web-scale speech model Whisper, by adapting it to unseen tasks with prompt engineering. We design task-specific prompts, by either leveraging another large-scale model, or simply manipulating the special tokens in the default prompts. Experiments show that our proposed prompts improve performance by 10% to 45% on the three zero-shot tasks, and even outperform SotA supervised models on some datasets.
arXiv Detail & Related papers (2023-05-18T16:32:58Z)
Improving the Intent Classification accuracy in Noisy Environment [9.447108578893639]
In this paper, we investigate how environmental noise and related noise reduction techniques to address the intent classification task with end-to-end neural models. For this task, the use of speech enhancement greatly improves the classification accuracy in noisy conditions.
arXiv Detail & Related papers (2023-03-12T06:11:44Z)
Zero-Shot Prompting for Implicit Intent Prediction and Recommendation with Commonsense Reasoning [28.441725610692714]
This paper proposes a framework of multi-domain dialogue systems, which can automatically infer implicit intents based on user utterances. The proposed framework is demonstrated effective to realize implicit intents and recommend associated bots in a zero-shot manner.
arXiv Detail & Related papers (2022-10-12T03:33:49Z)
Template-based Approach to Zero-shot Intent Recognition [7.330908962006392]
In this paper, we explore the generalized zero-shot setup for intent recognition. Following best practices for zero-shot text classification, we treat the task with a sentence pair modeling approach. We outperform previous state-of-the-art f1-measure by up to 16% for unseen intents.
arXiv Detail & Related papers (2022-06-22T08:44:59Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
Intent Classification Using Pre-Trained Embeddings For Low Resource Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing. We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios. We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z)
Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling. We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z)
Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification [81.80311855996584]
We propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model. We achieve 90.86% and 99.07% accuracy on ATIS and Fluent speech corpus, respectively.
arXiv Detail & Related papers (2021-02-15T07:20:06Z)
Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model. We derive the bias score for every word in the system vocabulary from the training corpus. We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z)
IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems [80.0781718687327]
We analyze user intent patterns in information-seeking conversations and propose an intent-aware neural response ranking model "IART" IART is built on top of the integration of user intent modeling and language representation learning with the Transformer architecture.
arXiv Detail & Related papers (2020-02-03T05:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.