Word-Free Spoken Language Understanding for Mandarin-Chinese
- URL: http://arxiv.org/abs/2107.00186v1
- Date: Thu, 1 Jul 2021 02:31:22 GMT
- Title: Word-Free Spoken Language Understanding for Mandarin-Chinese
- Authors: Zhiyuan Guo, Yuexin Li, Guo Chen, Xingyu Chen, Akshat Gupta
- Abstract summary: We propose a Transformer-based SLU system that works directly on phones.
This acoustic-based SLU system consists of only two blocks and does not require the presence of ASR module.
We verify the effectiveness of the system on an intent classification dataset in Mandarin Chinese.
- Score: 9.681114975579211
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken dialogue systems such as Siri and Alexa provide great convenience to
people's everyday life. However, current spoken language understanding (SLU)
pipelines largely depend on automatic speech recognition (ASR) modules, which
require a large amount of language-specific training data. In this paper, we
propose a Transformer-based SLU system that works directly on phones. This
acoustic-based SLU system consists of only two blocks and does not require the
presence of ASR module. The first block is a universal phone recognition
system, and the second block is a Transformer-based language model for phones.
We verify the effectiveness of the system on an intent classification dataset
in Mandarin Chinese.
Related papers
- Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - On Building Spoken Language Understanding Systems for Low Resourced
Languages [1.2183405753834562]
We present a series of experiments to explore extremely low-resourced settings.
We perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset.
We find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features.
arXiv Detail & Related papers (2022-05-25T14:44:51Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Multilingual and crosslingual speech recognition using
phonological-vector based phone embeddings [20.93287944284448]
We propose to join phonology driven phone embedding (top-down) and deep neural network (DNN) based acoustic feature extraction (bottom-up) to calculate phone probabilities.
No inversion from acoustics to phonological features is required for speech recognition.
Experiments are conducted on the CommonVoice dataset (German, French, Spanish and Italian) and the AISHLL-1 dataset (Mandarin)
arXiv Detail & Related papers (2021-07-11T12:56:47Z) - Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered
Language for Universal Phone Recognition Experiments [7.286387368812729]
This paper presents a publicly available, phonetically transcribed corpus of 2255 utterances in the endangered Tangkhulic language East Tusom.
Because the dataset is transcribed in terms of phones, rather than phonemes, it is a better match for universal phone recognition systems than many larger datasets.
arXiv Detail & Related papers (2021-04-02T00:26:10Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.