Silent Speech Interfaces for Speech Restoration: A Review
- URL: http://arxiv.org/abs/2009.02110v3
- Date: Sun, 27 Sep 2020 08:50:17 GMT
- Title: Silent Speech Interfaces for Speech Restoration: A Review
- Authors: Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, Juan M.
Mart\'in-Do\~nas, Jos\'e L. P\'erez-C\'ordoba, Angel M. Gomez
- Abstract summary: Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
- Score: 59.68902463890532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This review summarises the status of silent speech interface (SSI) research.
SSIs rely on non-acoustic biosignals generated by the human body during speech
production to enable communication whenever normal verbal communication is not
possible or not desirable. In this review, we focus on the first case and
present latest SSI research aimed at providing new alternative and augmentative
communication methods for persons with severe speech disorders. SSIs can employ
a variety of biosignals to enable silent communication, such as
electrophysiological recordings of neural activity, electromyographic (EMG)
recordings of vocal tract movements or the direct tracking of articulator
movements using imaging techniques. Depending on the disorder, some sensing
techniques may be better suited than others to capture speech-related
information. For instance, EMG and imaging techniques are well suited for
laryngectomised patients, whose vocal tract remains almost intact but are
unable to speak after the removal of the vocal folds, but fail for severely
paralysed individuals. From the biosignals, SSIs decode the intended message,
using automatic speech recognition or speech synthesis algorithms. Despite
considerable advances in recent years, most present-day SSIs have only been
validated in laboratory settings for healthy users. Thus, as discussed in this
paper, a number of challenges remain to be addressed in future research before
SSIs can be promoted to real-world applications. If these issues can be
addressed successfully, future SSIs will improve the lives of persons with
severe speech impairments by restoring their communication capabilities.
Related papers
- Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography [0.0]
Millions of individuals lose the ability to speak intelligibly due to neuromuscular disease, stroke, trauma, and head/neck cancer surgery.
Noninvasive surface electromyography (sEMG) has shown promise to restore speech output in these individuals.
The goal is to collect sEMG signals from multiple articulatory sites as people silently produce speech and then decode the signals to enable fluent and natural communication.
arXiv Detail & Related papers (2024-11-04T20:31:22Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Artificial Intelligence for Cochlear Implants: Review of Strategies, Challenges, and Perspectives [2.608119698700597]
This review aims to comprehensively cover advancements in CI-based ASR and speech enhancement, among other related aspects.
The review will delve into potential applications and suggest future directions to bridge existing research gaps in this domain.
arXiv Detail & Related papers (2024-03-17T11:28:23Z) - SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using
Deep Neural Networks [18.968402215723]
A system to detect a user's unvoiced utterance is proposed.
Our proposed system recognizes the utterance contents without the user's uttering voice.
We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.
arXiv Detail & Related papers (2023-03-03T07:46:35Z) - Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
by Re-Synthesis [67.73554826428762]
We propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR.
Our approach leverages audio-visual speech cues to generate the codes of a neural speech, enabling efficient synthesis of clean, realistic speech from noisy signals.
arXiv Detail & Related papers (2022-03-31T17:57:10Z) - Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric
Speech Recognition [4.637732011720613]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
To have robust dysarthria-specific ASR, sufficient training speech is required.
Recent advances in Text-To-Speech synthesis suggest the possibility of using synthesis for data augmentation.
arXiv Detail & Related papers (2022-01-27T15:22:09Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Multi-task self-supervised learning for Robust Speech Recognition [75.11748484288229]
This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments.
We employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances.
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
arXiv Detail & Related papers (2020-01-25T00:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.