Beyond the Voice: Inertial Sensing of Mouth Motion for High Security Speech Verification
- URL: http://arxiv.org/abs/2510.15173v1
- Date: Thu, 16 Oct 2025 22:26:18 GMT
- Title: Beyond the Voice: Inertial Sensing of Mouth Motion for High Security Speech Verification
- Authors: Ynes Ineza, Muhammad A. Ullah, Abdul Serwadda, Aurore Munyaneza,
- Abstract summary: We present a second authentication factor that combines acoustic evidence with the unique motion patterns of a speaker's lower face.<n>Our system records a distinct motion signature with strong discriminative power across individuals.<n>We discuss specific use cases where this second line of defense could provide tangible security benefits to voice authentication systems.
- Score: 0.34998703934432673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice interfaces are increasingly used in high stakes domains such as mobile banking, smart home security, and hands free healthcare. Meanwhile, modern generative models have made high quality voice forgeries inexpensive and easy to create, eroding confidence in voice authentication alone. To strengthen protection against such attacks, we present a second authentication factor that combines acoustic evidence with the unique motion patterns of a speaker's lower face. By placing lightweight inertial sensors around the mouth to capture mouth opening and evolving lower facial geometry, our system records a distinct motion signature with strong discriminative power across individuals. We built a prototype and recruited 43 participants to evaluate the system under four conditions seated, walking on level ground, walking on stairs, and speaking with different language backgrounds (native vs. non native English). Across all scenarios, our approach consistently achieved a median equal error rate (EER) of 0.01 or lower, indicating that mouth movement data remain robust under variations in gait, posture, and spoken language. We discuss specific use cases where this second line of defense could provide tangible security benefits to voice authentication systems.
Related papers
- Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models [51.7170633585748]
Stream-Voice-Anon adapts modern causal LM-based NAC architectures specifically for streaming speaker anonymization.<n>Our anonymization approach incorporates pseudo-speaker representation sampling, a speaker embedding mixing and diverse prompt selection strategies.<n>Under the VoicePrivacy 2024 Challenge protocol, Stream-Voice-Anon achieves substantial improvements in intelligibility.
arXiv Detail & Related papers (2026-01-20T13:23:44Z) - Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race [5.051497895059242]
Existing benchmarks aggregate diverse fake voice samples into a single dataset for evaluation.<n>This practice masks method-specific artifacts and obscures the varying performance of detectors against different generation paradigms.<n>We introduce the first ecosystem-level benchmark that systematically evaluates the interplay between 17 state-of-the-art fake voice generators and 8 leading detectors through a novel one-to-one evaluation protocol.
arXiv Detail & Related papers (2025-10-08T00:52:06Z) - Backdoor Attacks Against Speech Language Models [63.07317091368079]
We present the first systematic study of audio backdoor attacks against speech language models.<n>We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks.<n>We propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders.
arXiv Detail & Related papers (2025-10-01T17:45:04Z) - VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks [51.68795949691009]
We introduce VoxGuard, a framework grounded in differential privacy and membership inference.<n>For attributes, we show that simple transparent attacks recover gender and accent with near-perfect accuracy even after anonymization.<n>Our results demonstrate that EER substantially underestimates leakage, highlighting the need for low-FPR evaluation.
arXiv Detail & Related papers (2025-09-22T20:57:48Z) - FreeTalk:A plug-and-play and black-box defense against speech synthesis attacks [40.22853425929116]
We propose a lightweight, robust, plug-and-play privacy preservation method against speech synthesis attacks.<n>Our method generates and adds a frequency-domain perturbation to the original speech to achieve privacy protection and high speech quality.
arXiv Detail & Related papers (2025-08-30T17:10:22Z) - Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework [6.002582335323663]
We present an adversarial attack targeting the speech input of aligned Multimodal Large Language Models (MLLMs) in a white box scenario.<n>We introduce a novel token level attack that leverages access to the model's speech tokenization to generate adversarial token sequences.<n>Our approach achieves up to 89 percent attack success rate across multiple restricted tasks.
arXiv Detail & Related papers (2025-05-24T20:46:36Z) - Pre-Finetuning for Few-Shot Emotional Speech Recognition [20.894029832911617]
We view speaker adaptation as a few-shot learning problem.
We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives.
arXiv Detail & Related papers (2023-02-24T22:38:54Z) - Defend Data Poisoning Attacks on Voice Authentication [6.160281428772401]
Machine learning attacks are putting voice authentication systems at risk.
We propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator.
Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
arXiv Detail & Related papers (2022-09-09T22:48:35Z) - Self-Supervised Speech Representations Preserve Speech Characteristics
while Anonymizing Voices [15.136348385992047]
We train several voice conversion models using self-supervised speech representations.
Converted voices retain a low word error rate within 1% of the original voice.
Experiments on dysarthric speech data show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices.
arXiv Detail & Related papers (2022-04-04T17:48:01Z) - Real-Time Neural Voice Camouflage [23.171336558901118]
We propose a method to camouflage a person's voice over-the-air from automatic speech recognition systems.
Standard adversarial attacks are not effective in real-time streaming situations.
We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future.
arXiv Detail & Related papers (2021-12-14T00:27:44Z) - Protecting gender and identity with disentangled speech representations [49.00162808063399]
We show that protecting gender information in speech is more effective than modelling speaker-identity information.
We present a novel way to encode gender information and disentangle two sensitive biometric identifiers.
arXiv Detail & Related papers (2021-04-22T13:31:41Z) - High Fidelity Speech Regeneration with Application to Speech Enhancement [96.34618212590301]
We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner.
Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source.
arXiv Detail & Related papers (2021-01-31T10:54:27Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.