SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using
Deep Neural Networks
- URL: http://arxiv.org/abs/2303.01758v1
- Date: Fri, 3 Mar 2023 07:46:35 GMT
- Title: SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using
Deep Neural Networks
- Authors: Naoki Kimura, Michinari Kono, and Jun Rekimoto
- Abstract summary: A system to detect a user's unvoiced utterance is proposed.
Our proposed system recognizes the utterance contents without the user's uttering voice.
We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.
- Score: 18.968402215723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The availability of digital devices operated by voice is expanding rapidly.
However, the applications of voice interfaces are still restricted. For
example, speaking in public places becomes an annoyance to the surrounding
people, and secret information should not be uttered. Environmental noise may
reduce the accuracy of speech recognition. To address these limitations, a
system to detect a user's unvoiced utterance is proposed. From internal
information observed by an ultrasonic imaging sensor attached to the underside
of the jaw, our proposed system recognizes the utterance contents without the
user's uttering voice. Our proposed deep neural network model is used to obtain
acoustic features from a sequence of ultrasound images. We confirmed that audio
signals generated by our system can control the existing smart speakers. We
also observed that a user can adjust their oral movement to learn and improve
the accuracy of their voice recognition.
Related papers
- Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege [26.3587130339825]
We propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans.
Our system can reduce the recognition accuracy of recordings to below 50% under all tested speech recognition systems.
arXiv Detail & Related papers (2024-01-28T16:56:56Z) - Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual
Imitation Learning [62.83590925557013]
We learn a set of challenging partially-observed manipulation tasks from visual and audio inputs.
Our proposed system learns these tasks by combining offline imitation learning from tele-operated demonstrations and online finetuning.
In a set of simulated tasks, we find that our system benefits from using audio, and that by using online interventions we are able to improve the success rate of offline imitation learning by 20%.
arXiv Detail & Related papers (2022-05-30T04:52:58Z) - Disappeared Command: Spoofing Attack On Automatic Speech Recognition
Systems with Sound Masking [2.9308762189250746]
Voice interfaces are becoming more and more widely used as input for many applications and smart devices.
DNN is easily disturbed by slight disturbances and makes false recognition, which is extremely dangerous for intelligent voice applications controlled by voice.
arXiv Detail & Related papers (2022-04-19T16:26:34Z) - Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
by Re-Synthesis [67.73554826428762]
We propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR.
Our approach leverages audio-visual speech cues to generate the codes of a neural speech, enabling efficient synthesis of clean, realistic speech from noisy signals.
arXiv Detail & Related papers (2022-03-31T17:57:10Z) - Privacy-Preserving Speech Representation Learning using Vector
Quantization [0.0]
Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns.
This paper aims to produce an anonymous representation while preserving speech recognition performance.
arXiv Detail & Related papers (2022-03-15T14:01:11Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - A GAN-based Approach for Mitigating Inference Attacks in Smart Home
Environment [3.785123406103385]
In this study, we explore the problem of adversaries spying on smart home users to infer sensitive information with the aid of machine learning techniques.
We propose a Generative Adrial Network (GAN) based approach for privacy preservation in smart homes.
arXiv Detail & Related papers (2020-11-13T02:14:32Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z) - TinySpeech: Attention Condensers for Deep Speech Recognition Neural
Networks on Edge Devices [71.68436132514542]
We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge.
To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
arXiv Detail & Related papers (2020-08-10T16:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.