Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant
- URL: http://arxiv.org/abs/2309.15203v1
- Date: Tue, 26 Sep 2023 19:03:45 GMT
- Title: Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant
- Authors: Chenpei Huang, Hui Zhong, Jie Lian, Pavana Prakash, Dian Shi, Yuan Xu, Miao Pan,
- Abstract summary: Air and bone conduction (AC/BC) from the same vocalization are coupled (or concurrent) and user-level unique.
The legitimate user can defeat acoustic domain and even cross-domain spoofing samples with the proposed two-stage AirBone authentication.
- Score: 10.694874051404648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in machine learning and natural language processing have fostered the enormous prosperity of smart voice assistants and their services, e.g., Alexa, Google Home, Siri, etc. However, voice spoofing attacks are deemed to be one of the major challenges of voice control security, and never stop evolving such as deep-learning-based voice conversion and speech synthesis techniques. To solve this problem outside the acoustic domain, we focus on head-wearable devices, such as earbuds and virtual reality (VR) headsets, which are feasible to continuously monitor the bone-conducted voice in the vibration domain. Specifically, we identify that air and bone conduction (AC/BC) from the same vocalization are coupled (or concurrent) and user-level unique, which makes them suitable behavior and biometric factors for multi-factor authentication (MFA). The legitimate user can defeat acoustic domain and even cross-domain spoofing samples with the proposed two-stage AirBone authentication. The first stage answers \textit{whether air and bone conduction utterances are time domain consistent (TC)} and the second stage runs \textit{bone conduction speaker recognition (BC-SR)}. The security level is hence increased for two reasons: (1) current acoustic attacks on smart voice assistants cannot affect bone conduction, which is in the vibration domain; (2) even for advanced cross-domain attacks, the unique bone conduction features can detect adversary's impersonation and machine-induced vibration. Finally, AirBone authentication has good usability (the same level as voice authentication) compared with traditional MFA and those specially designed to enhance smart voice security. Our experimental results show that the proposed AirBone authentication is usable and secure, and can be easily equipped by commercial off-the-shelf head wearables with good user experience.
Related papers
- SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion [5.483488375189695]
Face-based Voice Conversion (FVC) is a novel task that leverages facial images to generate the target speaker's voice style.
Previous work has two shortcomings: (1) suffering from obtaining facial embeddings that are well-aligned with the speaker's voice identity information, and (2) inadequacy in decoupling content and speaker identity information from the audio input.
We present a novel FVC method, Identity-Disentanglement Face-based Voice Conversion (ID-FaceVC), which overcomes the above two limitations.
arXiv Detail & Related papers (2024-09-01T11:51:18Z) - EarPass: Secure and Implicit Call Receiver Authentication Using Ear Acoustic Sensing [14.78387043362623]
EarPass is a secure and implicit call receiver authentication scheme for smartphones.
It sends inaudible acoustic signals through the earpiece speaker to actively sense the outer ear.
It can achieve a balanced accuracy of 96.95% and an equal error rate of 1.53%.
arXiv Detail & Related papers (2024-04-23T13:03:09Z) - Privacy against Real-Time Speech Emotion Detection via Acoustic
Adversarial Evasion of Machine Learning [7.387631194438338]
DARE-GP is a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech.
Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment.
arXiv Detail & Related papers (2022-11-17T00:25:05Z) - Speaker Anonymization with Phonetic Intermediate Representations [22.84840887071428]
We propose a speaker anonymization pipeline that generates speech conditioned on phonetic transcriptions and anonymized speaker embeddings.
Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input.
arXiv Detail & Related papers (2022-07-11T13:02:08Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - Practical Attacks on Voice Spoofing Countermeasures [3.388509725285237]
We show how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form.
Our results call into question the security of modern voice authentication systems in light of the real threat of attackers bypassing these measures.
arXiv Detail & Related papers (2021-07-30T14:07:49Z) - Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant
Environments [76.98764900754111]
Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker.
We propose Voicy, a new VC framework particularly tailored for noisy speech.
Our method, which is inspired by the de-noising auto-encoders framework, is comprised of four encoders (speaker, content, phonetic and acoustic-ASR) and one decoder.
arXiv Detail & Related papers (2021-06-16T15:47:06Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Many-to-Many Voice Transformer Network [55.17770019619078]
This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework.
It enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech.
arXiv Detail & Related papers (2020-05-18T04:02:08Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.