Improving Rare Words Recognition through Homophone Extension and Unified
Writing for Low-resource Cantonese Speech Recognition
- URL: http://arxiv.org/abs/2302.00836v1
- Date: Thu, 2 Feb 2023 02:46:32 GMT
- Title: Improving Rare Words Recognition through Homophone Extension and Unified
Writing for Low-resource Cantonese Speech Recognition
- Authors: HoLam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, Helen
Meng
- Abstract summary: Homophone characters are common in tonal syllable-based languages, such as Mandarin and Cantonese.
This paper presents a novel homophone extension method to integrate human knowledge of the homophone lexicon into the beam search decoding process.
We also propose an automatic unified writing method to merge the variants of Cantonese characters.
- Score: 36.10245119706219
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Homophone characters are common in tonal syllable-based languages, such as
Mandarin and Cantonese. The data-intensive end-to-end Automatic Speech
Recognition (ASR) systems are more likely to mis-recognize homophone characters
and rare words under low-resource settings. For the problem of lowresource
Cantonese speech recognition, this paper presents a novel homophone extension
method to integrate human knowledge of the homophone lexicon into the beam
search decoding process with language model re-scoring. Besides, we propose an
automatic unified writing method to merge the variants of Cantonese characters
and standardize speech annotation guidelines, which enables more efficient
utilization of labeled utterances by providing more samples for the merged
characters. We empirically show that both homophone extension and unified
writing improve the recognition performance significantly on both in-domain and
out-of-domain test sets, with an absolute Character Error Rate (CER) decrease
of around 5% and 18%.
Related papers
- Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition
and Phoneme to Grapheme Translation [9.118302330129284]
This research optimize two-pass cross-lingual transfer learning in low-resource languages.
We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics.
We introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation.
arXiv Detail & Related papers (2023-12-06T06:37:24Z) - MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and
Phonetic Domains for Speech Representation Learning [0.76146285961466]
We present a methodology for linguistic feature extraction, focusing on automatically syllabifying words in multiple languages.
In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification.
The system was built with open-source components and resources.
arXiv Detail & Related papers (2023-10-17T19:27:23Z) - SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation [10.016862617549991]
This paper proposes SoundChoice, a novel Grapheme-to-Phoneme (G2P) architecture that processes entire sentences rather than operating at the word level.
SoundChoice achieves a Phoneme Error Rate (PER) of 2.65% on whole-sentence transcription using data from LibriSpeech and Wikipedia.
arXiv Detail & Related papers (2022-07-27T01:14:59Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Spoken Term Detection Methods for Sparse Transcription in Very
Low-resource Settings [20.410074074340447]
Experiments on two oral languages show that a pretrained universal phone recognizer, fine-tuned with only a few minutes of target language speech, can be used for spoken term detection.
We show that representing phoneme recognition ambiguity in a graph structure can further boost the recall while maintaining high precision in the low resource spoken term detection task.
arXiv Detail & Related papers (2021-06-11T04:09:54Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - Homophone-based Label Smoothing in End-to-End Automatic Speech
Recognition [8.066444614339614]
The proposed method uses pronunciation knowledge of homophones in a more complex way.
Experiments with hybrid CTC sequence-to-sequence model show that the new method can reduce character error rate (CER) by 0.4% absolutely.
arXiv Detail & Related papers (2020-04-07T14:37:30Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.