On Building Spoken Language Understanding Systems for Low Resourced
Languages
- URL: http://arxiv.org/abs/2205.12818v1
- Date: Wed, 25 May 2022 14:44:51 GMT
- Title: On Building Spoken Language Understanding Systems for Low Resourced
Languages
- Authors: Akshat Gupta
- Abstract summary: We present a series of experiments to explore extremely low-resourced settings.
We perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset.
We find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features.
- Score: 1.2183405753834562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken dialog systems are slowly becoming and integral part of the human
experience due to their various advantages over textual interfaces. Spoken
language understanding (SLU) systems are fundamental building blocks of spoken
dialog systems. But creating SLU systems for low resourced languages is still a
challenge. In a large number of low resourced language, we don't have access to
enough data to build automatic speech recognition (ASR) technologies, which are
fundamental to any SLU system. Also, ASR based SLU systems do not generalize to
unwritten languages. In this paper, we present a series of experiments to
explore extremely low-resourced settings where we perform intent classification
with systems trained on as low as one data-point per intent and with only one
speaker in the dataset. We also work in a low-resourced setting where we do not
use language specific ASR systems to transcribe input speech, which compounds
the challenge of building SLU systems to simulate a true low-resourced setting.
We test our system on Belgian Dutch (Flemish) and English and find that using
phonetic transcriptions to make intent classification systems in such
low-resourced setting performs significantly better than using speech features.
Specifically, when using a phonetic transcription based system over a feature
based system, we see average improvements of 12.37% and 13.08% for binary and
four-class classification problems respectively, when averaged over 49
different experimental settings.
Related papers
- An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Creating Spoken Dialog Systems in Ultra-Low Resourced Settings [0.0]
We build on existing light models for intent classification in Flemish.
We apply different augmentation techniques on two levels -- the voice level, and the phonetic transcripts level.
We find that our data augmentation techniques, on both levels, have improved the model performance on a number of tasks.
arXiv Detail & Related papers (2023-12-11T10:04:05Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - STOP: A dataset for Spoken Task Oriented Semantic Parsing [66.14615249745448]
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
arXiv Detail & Related papers (2022-06-29T00:36:34Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Word-Free Spoken Language Understanding for Mandarin-Chinese [9.681114975579211]
We propose a Transformer-based SLU system that works directly on phones.
This acoustic-based SLU system consists of only two blocks and does not require the presence of ASR module.
We verify the effectiveness of the system on an intent classification dataset in Mandarin Chinese.
arXiv Detail & Related papers (2021-07-01T02:31:22Z) - Low-Resource Spoken Language Identification Using Self-Attentive Pooling
and Deep 1D Time-Channel Separable Convolutions [0.0]
We show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results in low-resource setting for the language identification task.
We also substantiate the hypothesis that whenever the dataset is diverse enough so that the other classification factors, like gender, age etc. are well-averaged, the confusion matrix for LID system bears the language similarity measure.
arXiv Detail & Related papers (2021-05-31T18:35:27Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.