Mere account mein kitna balance hai? -- On building voice enabled
Banking Services for Multilingual Communities
- URL: http://arxiv.org/abs/2010.16411v1
- Date: Fri, 9 Oct 2020 01:20:09 GMT
- Title: Mere account mein kitna balance hai? -- On building voice enabled
Banking Services for Multilingual Communities
- Authors: Akshat Gupta, Sai Krishna Rallabandi and Alan W Black
- Abstract summary: We present our initial exploratory work towards building voice enabled banking services for multilingual societies.
Code Mixing is a phenomenon where lexical items from one language are embedded in the utterance of another.
We investigate various training strategies for building speech based intent recognition systems.
- Score: 47.955173277834795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tremendous progress in speech and language processing has brought language
technologies closer to daily human life. Voice technology has the potential to
act as a horizontal enabling layer across all aspects of digitization. It is
especially beneficial to rural communities in scenarios like a pandemic. In
this work we present our initial exploratory work towards one such direction --
building voice enabled banking services for multilingual societies. Speech
interaction for typical banking transactions in multilingual communities
involves the presence of filled pauses and is characterized by Code Mixing.
Code Mixing is a phenomenon where lexical items from one language are embedded
in the utterance of another. Therefore speech systems deployed for banking
applications should be able to process such content. In our work we investigate
various training strategies for building speech based intent recognition
systems. We present our results using a Naive Bayes classifier on approximate
acoustic phone units using the Allosaurus library.
Related papers
- Literary and Colloquial Dialect Identification for Tamil using Acoustic Features [0.0]
Speech technology plays a role in preserving various dialects of a language from going extinct.
The current work proposes a way to identify two popular and broadly classified Tamil dialects.
arXiv Detail & Related papers (2024-08-27T09:00:27Z) - Seamless: Multilingual Expressive and Streaming Speech Translation [71.12826355107889]
We introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.
First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model- SeamlessM4T v2.
We bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time.
arXiv Detail & Related papers (2023-12-08T17:18:42Z) - TRAVID: An End-to-End Video Translation Framework [1.6131714685439382]
We present an end-to-end video translation system that not only translates spoken language but also synchronizes the translated speech with the lip movements of the speaker.
Our system focuses on translating educational lectures in various Indian languages, and it is designed to be effective even in low-resource system settings.
arXiv Detail & Related papers (2023-09-20T14:13:05Z) - ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
Multi-Speaker Text-to-Speech [58.93395189153713]
We extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks.
We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes.
Our model shows great improvements over speaker-embedding-based multi-speaker TTS methods.
arXiv Detail & Related papers (2022-11-07T13:35:16Z) - Talking Face Generation with Multilingual TTS [0.8229645116651871]
We propose a system combining a talking face generation system with a text-to-speech system.
Our system can synthesize natural multilingual speeches while maintaining the vocal identity of the speaker.
For our demo, we add a translation API to the preprocessing stage and present it in the form of a neural dubber.
arXiv Detail & Related papers (2022-05-13T02:08:35Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - CSTNet: Contrastive Speech Translation Network for Self-Supervised
Speech Representation Learning [11.552745999302905]
More than half of the 7,000 languages in the world are in imminent danger of going extinct.
It is relatively easy to obtain textual translations corresponding to speech.
We construct a convolutional neural network audio encoder capable of extracting linguistic representations from speech.
arXiv Detail & Related papers (2020-06-04T12:21:48Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.