Related papers: IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases

IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases

URL: http://arxiv.org/abs/2312.09572v1
Date: Fri, 15 Dec 2023 07:04:40 GMT
Title: IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases
Authors: Sunghwa Lee, Younghoon Shin, Myungjong Kim, Jiwon Seo
Abstract summary: Impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body parts. These advantages include high range resolution, high penetrability, low power consumption, robustness to external light or sound interference, and the ability to be embedded in space-constrained handheld devices.
Score: 2.5003170112399045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several sensing techniques have been proposed for silent speech recognition (SSR); however, many of these methods require invasive processes or sensor attachment to the skin using adhesive tape or glue, rendering them unsuitable for frequent use in daily life. By contrast, impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body parts, offering several advantages for SSR. These advantages include high range resolution, high penetrability, low power consumption, robustness to external light or sound interference, and the ability to be embedded in space-constrained handheld devices. This study demonstrated IR-UWB radar-based contactless SSR using four types of speech stimuli (vowels, consonants, words, and phrases). To achieve this, a novel speech feature extraction algorithm specifically designed for IR-UWB radar-based SSR is proposed. Each speech stimulus is recognized by applying a classification algorithm to the extracted speech features. Two different algorithms, multidimensional dynamic time warping (MD-DTW) and deep neural network-hidden Markov model (DNN-HMM), were compared for the classification task. Additionally, a favorable radar antenna position, either in front of the user's lips or below the user's chin, was determined to achieve higher recognition accuracy. Experimental results demonstrated the efficacy of the proposed speech feature extraction algorithm combined with DNN-HMM for classifying vowels, consonants, words, and phrases. Notably, this study represents the first demonstration of phoneme-level SSR using contactless radar.

Related papers

Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons [69.73249913506042]
This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons to process time-domain signals directly.<n>By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity.<n> Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs.
arXiv Detail & Related papers (2025-06-24T21:14:59Z)
RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence [10.115852646162843]
We present Radar-LLM, the first framework that leverages large language models (LLMs) for human understanding using millimeter-wave radar as the sensing modality. To address data scarcity, we introduce a physics-aware pipeline synthesis that generates realistic radar-text pairs from motion-text datasets. Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions.
arXiv Detail & Related papers (2025-04-14T04:18:25Z)
Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers. Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements. We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z)
AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications. We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z)
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios [36.50731790624643]
We introduce RIR-SF, a novel spatial feature based on room impulse response (RIR) RIR-SF significantly outperforms traditional 3D spatial features, showing superior theoretical and empirical performance. We also propose an optimized all-neural multi-channel ASR framework for RIR-SF, achieving a relative 21.3% reduction in CER for target speaker ASR in multi-channel settings.
arXiv Detail & Related papers (2023-10-31T20:42:08Z)
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation [83.36685075570232]
This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. We explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model. A proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set.
arXiv Detail & Related papers (2023-07-23T05:39:39Z)
A Deep Learning System for Domain-specific Speech Recognition [0.0]
The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems. The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM. The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated.
arXiv Detail & Related papers (2023-03-18T22:19:09Z)
HDNet: Hierarchical Dynamic Network for Gait Recognition using Millimeter-Wave Radar [13.19744551082316]
We propose a Hierarchical Dynamic Network (HDNet) for gait recognition using mmWave radar. To prove the superiority of our methods, we perform extensive experiments on two public mmWave radar-based gait recognition datasets.
arXiv Detail & Related papers (2022-11-01T07:34:22Z)
DeepHybrid: Deep Learning on Automotive Radar Spectra and Reflections for Object Classification [0.5669790037378094]
We propose a method that combines classical radar signal processing and Deep Learning algorithms. The proposed method can be used for example to improve automatic emergency braking or collision avoidance systems.
arXiv Detail & Related papers (2022-02-17T08:45:11Z)
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization [73.62550438861942]
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR) In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance.
arXiv Detail & Related papers (2020-10-30T20:26:28Z)
Deep Reinforcement Learning Control for Radar Detection and Tracking in Congested Spectral Environments [8.103366584285645]
A radar learns to vary the bandwidth and center frequency of its linear frequency modulated (LFM) waveforms to mitigate mutual interference with other systems. We extend the DQL-based approach to incorporate Double Q-learning and a recurrent neural network to form a Double Deep Recurrent Q-Network. Our experimental results indicate that the proposed Deep RL approach significantly improves radar detection performance in congested spectral environments.
arXiv Detail & Related papers (2020-06-23T17:21:28Z)
Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals. We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.