End-to-End Speech to Intent Prediction to improve E-commerce Customer
Support Voicebot in Hindi and English
- URL: http://arxiv.org/abs/2211.07710v1
- Date: Wed, 26 Oct 2022 18:29:44 GMT
- Title: End-to-End Speech to Intent Prediction to improve E-commerce Customer
Support Voicebot in Hindi and English
- Authors: Abhinav Goyal, Anupam Singh, Nikesh Garera
- Abstract summary: We discuss an end-to-end (E2E) S2I model for customer support voicebot task in a bilingual setting.
We show how we can solve E2E intent classification by leveraging a pre-trained automatic speech recognition (ASR) model with slight modification and fine-tuning on small annotated datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automation of on-call customer support relies heavily on accurate and
efficient speech-to-intent (S2I) systems. Building such systems using
multi-component pipelines can pose various challenges because they require
large annotated datasets, have higher latency, and have complex deployment.
These pipelines are also prone to compounding errors. To overcome these
challenges, we discuss an end-to-end (E2E) S2I model for customer support
voicebot task in a bilingual setting. We show how we can solve E2E intent
classification by leveraging a pre-trained automatic speech recognition (ASR)
model with slight modification and fine-tuning on small annotated datasets.
Experimental results show that our best E2E model outperforms a conventional
pipeline by a relative ~27% on the F1 score.
Related papers
- Leveraging Large Text Corpora for End-to-End Speech Summarization [58.673480990374635]
End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech.
We present two novel methods that leverage a large amount of external text summarization data for E2E SSum training.
arXiv Detail & Related papers (2023-03-02T05:19:49Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Contextual Density Ratio for Language Model Biasing of Sequence to
Sequence ASR Systems [2.4909170697740963]
We propose a contextual density ratio approach for both training a contextual aware E2E model and adapting the language model to named entities.
Our proposed technique achieves a relative improvement of up to 46.5% on the names over an E2E baseline without degrading the overall recognition accuracy of the whole test set.
arXiv Detail & Related papers (2022-06-29T13:12:46Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Exploring Transfer Learning For End-to-End Spoken Language Understanding [8.317084844841323]
An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option.
We propose an E2E system that is designed to jointly train on multiple speech-to-text tasks.
We show that it beats the performance of E2E models trained on individual tasks.
arXiv Detail & Related papers (2020-12-15T19:02:15Z) - Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end
Spoken Language Understanding [14.752834813510702]
We treat an E2E system as a multi-modal model, with audio and text functioning as its two modalities.
We propose using different multi-modal losses to guide the acoustic embeddings to be closer to the text embeddings.
We train the CMLS model on two publicly available E2E datasets, across different cross-modal losses and show that our proposed triplet loss function achieves the best performance.
arXiv Detail & Related papers (2020-11-18T02:32:42Z) - Decoupling Pronunciation and Language for End-to-end Code-switching
Automatic Speech Recognition [66.47000813920617]
We propose a decoupled transformer model to use monolingual paired data and unpaired text data.
The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network.
By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model.
arXiv Detail & Related papers (2020-10-28T07:46:15Z) - Improving Tail Performance of a Deliberation E2E ASR Model Using a Large
Text Corpus [35.45918249451485]
End-to-end (E2E) automatic speech recognition systems lack the distinct language model (LM) component that characterizes traditional speech systems.
shallow fusion has been proposed as a method for incorporating a pre-trained LM into an E2E model at inference time.
We apply shallow fusion to incorporate a very large text corpus into a state-of-the-art E2EASR model.
arXiv Detail & Related papers (2020-08-24T14:53:10Z) - End-to-end Named Entity Recognition from English Speech [51.22888702264816]
We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
arXiv Detail & Related papers (2020-05-22T13:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.