Automatic Speech Recognition using limited vocabulary: A survey
- URL: http://arxiv.org/abs/2108.10254v1
- Date: Mon, 23 Aug 2021 15:51:41 GMT
- Title: Automatic Speech Recognition using limited vocabulary: A survey
- Authors: Jean Louis K. E. Fendji, Diane M. Tala, Blaise O. Yenke, and Marcellin
Atemkeng
- Abstract summary: An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary.
This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic Speech Recognition (ASR) is an active field of research due to its
huge number of applications and the proliferation of interfaces or computing
devices that can support speech processing. But the bulk of applications is
based on well-resourced languages that overshadow under-resourced ones. Yet ASR
represents an undeniable mean to promote such languages, especially when design
human-to-human or human-to-machine systems involving illiterate people. An
approach to design an ASR system targeting under-resourced languages is to
start with a limited vocabulary. ASR using a limited vocabulary is a subset of
the speech recognition problem that focuses on the recognition of a small
number of words or sentences. This paper aims to provide a comprehensive view
of mechanisms behind ASR systems as well as techniques, tools, projects, recent
contributions, and possibly future directions in ASR using a limited
vocabulary. This work consequently provides a way to go when designing ASR
system using limited vocabulary. Although an emphasis is put on limited
vocabulary, most of the tools and techniques reported in this survey applied to
ASR systems in general.
Related papers
- Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach [0.6445605125467574]
This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks.
The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments.
We propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training.
arXiv Detail & Related papers (2024-06-03T15:38:40Z) - A Deep Learning System for Domain-specific Speech Recognition [0.0]
The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems.
The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM.
The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated.
arXiv Detail & Related papers (2023-03-18T22:19:09Z) - Hey ASR System! Why Aren't You More Inclusive? Automatic Speech
Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A
Literature Review [0.0]
We present research that addresses ASR biases against gender, race, and the sick and disabled.
We also discuss techniques for designing a more accessible and inclusive ASR technology.
arXiv Detail & Related papers (2022-11-17T13:15:58Z) - Can Visual Context Improve Automatic Speech Recognition for an Embodied
Agent? [3.7311680121118345]
We propose a new decoder biasing technique to incorporate the visual context while ensuring the ASR output does not degrade for incorrect context.
We achieve a 59% relative reduction in WER from an unmodified ASR system.
arXiv Detail & Related papers (2022-10-21T11:16:05Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - ASR data augmentation in low-resource settings using cross-lingual
multi-speaker TTS and cross-lingual voice conversion [49.617722668505834]
We show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training.
It is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
arXiv Detail & Related papers (2022-03-29T11:55:30Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Unsupervised Automatic Speech Recognition: A Review [2.6212127510234797]
We review the research literature to identify models and ideas that could lead to fully unsupervised ASR.
The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition.
arXiv Detail & Related papers (2021-06-09T08:33:20Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.