Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages
- URL: http://arxiv.org/abs/2011.03646v2
- Date: Fri, 19 Feb 2021 20:59:57 GMT
- Title: Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages
- Authors: Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black
- Abstract summary: We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
- Score: 51.0542215642794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With recent advancements in language technologies, humans are now speaking to
devices. Increasing the reach of spoken language technologies requires building
systems in local languages. A major bottleneck here are the underlying
data-intensive parts that make up such systems, including automatic speech
recognition (ASR) systems that require large amounts of labelled data. With the
aim of aiding development of spoken dialog systems in low resourced languages,
we propose a novel acoustics based intent recognition system that uses
discovered phonetic units for intent classification. The system is made up of
two blocks - the first block is a universal phone recognition system that
generates a transcript of discovered phonetic units for the input audio, and
the second block performs intent classification from the generated phonetic
transcripts. We propose a CNN+LSTM based architecture and present results for
two languages families - Indic languages and Romance languages, for two
different intent recognition tasks. We also perform multilingual training of
our intent classifier and show improved cross-lingual transfer and zero-shot
performance on an unknown language within the same language family.
Related papers
- LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - On Building Spoken Language Understanding Systems for Low Resourced
Languages [1.2183405753834562]
We present a series of experiments to explore extremely low-resourced settings.
We perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset.
We find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features.
arXiv Detail & Related papers (2022-05-25T14:44:51Z) - Automatic Spoken Language Identification using a Time-Delay Neural
Network [0.0]
A language identification system was built to distinguish between Arabic, Spanish, French, and Turkish.
A pre-existing multilingual dataset was used to train a series of acoustic models.
The system was provided with a custom multilingual language model and a specialized pronunciation lexicon.
arXiv Detail & Related papers (2022-05-19T13:47:48Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Word-Free Spoken Language Understanding for Mandarin-Chinese [9.681114975579211]
We propose a Transformer-based SLU system that works directly on phones.
This acoustic-based SLU system consists of only two blocks and does not require the presence of ASR module.
We verify the effectiveness of the system on an intent classification dataset in Mandarin Chinese.
arXiv Detail & Related papers (2021-07-01T02:31:22Z) - Intent Recognition and Unsupervised Slot Identification for Low
Resourced Spoken Dialog Systems [46.705058576039065]
We present an acoustic based SLU system that converts speech to its phonetic transcription using a universal phone recognition system.
We build a word-free natural language understanding module that does intent recognition and slot identification from these phonetic transcription.
We observe more than 10% improvement for intent classification in Tamil and more than 5% improvement for intent classification in Sinhala.
arXiv Detail & Related papers (2021-04-03T01:58:27Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.