IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels,
Consonants, Words, and Phrases
- URL: http://arxiv.org/abs/2312.09572v1
- Date: Fri, 15 Dec 2023 07:04:40 GMT
- Title: IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels,
Consonants, Words, and Phrases
- Authors: Sunghwa Lee, Younghoon Shin, Myungjong Kim, Jiwon Seo
- Abstract summary: Impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body parts.
These advantages include high range resolution, high penetrability, low power consumption, robustness to external light or sound interference, and the ability to be embedded in space-constrained handheld devices.
- Score: 2.5003170112399045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several sensing techniques have been proposed for silent speech recognition
(SSR); however, many of these methods require invasive processes or sensor
attachment to the skin using adhesive tape or glue, rendering them unsuitable
for frequent use in daily life. By contrast, impulse radio ultra-wideband
(IR-UWB) radar can operate without physical contact with users' articulators
and related body parts, offering several advantages for SSR. These advantages
include high range resolution, high penetrability, low power consumption,
robustness to external light or sound interference, and the ability to be
embedded in space-constrained handheld devices. This study demonstrated IR-UWB
radar-based contactless SSR using four types of speech stimuli (vowels,
consonants, words, and phrases). To achieve this, a novel speech feature
extraction algorithm specifically designed for IR-UWB radar-based SSR is
proposed. Each speech stimulus is recognized by applying a classification
algorithm to the extracted speech features. Two different algorithms,
multidimensional dynamic time warping (MD-DTW) and deep neural network-hidden
Markov model (DNN-HMM), were compared for the classification task.
Additionally, a favorable radar antenna position, either in front of the user's
lips or below the user's chin, was determined to achieve higher recognition
accuracy. Experimental results demonstrated the efficacy of the proposed speech
feature extraction algorithm combined with DNN-HMM for classifying vowels,
consonants, words, and phrases. Notably, this study represents the first
demonstration of phoneme-level SSR using contactless radar.
Related papers
- Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications.
We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z) - RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios [36.50731790624643]
We introduce RIR-SF, a novel spatial feature based on room impulse response (RIR)
RIR-SF significantly outperforms traditional 3D spatial features, showing superior theoretical and empirical performance.
We also propose an optimized all-neural multi-channel ASR framework for RIR-SF, achieving a relative 21.3% reduction in CER for target speaker ASR in multi-channel settings.
arXiv Detail & Related papers (2023-10-31T20:42:08Z) - Exploring the Integration of Speech Separation and Recognition with
Self-Supervised Learning Representation [83.36685075570232]
This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end.
We explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model.
A proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set.
arXiv Detail & Related papers (2023-07-23T05:39:39Z) - A Deep Learning System for Domain-specific Speech Recognition [0.0]
The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems.
The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM.
The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated.
arXiv Detail & Related papers (2023-03-18T22:19:09Z) - HDNet: Hierarchical Dynamic Network for Gait Recognition using
Millimeter-Wave Radar [13.19744551082316]
We propose a Hierarchical Dynamic Network (HDNet) for gait recognition using mmWave radar.
To prove the superiority of our methods, we perform extensive experiments on two public mmWave radar-based gait recognition datasets.
arXiv Detail & Related papers (2022-11-01T07:34:22Z) - DeepHybrid: Deep Learning on Automotive Radar Spectra and Reflections
for Object Classification [0.5669790037378094]
We propose a method that combines classical radar signal processing and Deep Learning algorithms.
The proposed method can be used for example to improve automatic emergency braking or collision avoidance systems.
arXiv Detail & Related papers (2022-02-17T08:45:11Z) - Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition
with Source Localization [73.62550438861942]
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR)
In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance.
arXiv Detail & Related papers (2020-10-30T20:26:28Z) - Deep Reinforcement Learning Control for Radar Detection and Tracking in
Congested Spectral Environments [8.103366584285645]
A radar learns to vary the bandwidth and center frequency of its linear frequency modulated (LFM) waveforms to mitigate mutual interference with other systems.
We extend the DQL-based approach to incorporate Double Q-learning and a recurrent neural network to form a Double Deep Recurrent Q-Network.
Our experimental results indicate that the proposed Deep RL approach significantly improves radar detection performance in congested spectral environments.
arXiv Detail & Related papers (2020-06-23T17:21:28Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.