Related papers: Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege

Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege

URL: http://arxiv.org/abs/2401.15704v1
Date: Sun, 28 Jan 2024 16:56:56 GMT
Title: Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege
Authors: Peng Huang, Yao Wei, Peng Cheng, Zhongjie Ba, Li Lu, Feng Lin, Yang Wang, Kui Ren,
Abstract summary: We propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans. Our system can reduce the recognition accuracy of recordings to below 50% under all tested speech recognition systems.
Score: 26.3587130339825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread smart devices raise people's concerns of being eavesdropped on. To enhance voice privacy, recent studies exploit the nonlinearity in microphone to jam audio recorders with inaudible ultrasound. However, existing solutions solely rely on energetic masking. Their simple-form noise leads to several problems, such as high energy requirements and being easily removed by speech enhancement techniques. Besides, most of these solutions do not support authorized recording, which restricts their usage scenarios. In this paper, we design an efficient yet robust system that can jam microphones while preserving authorized recording. Specifically, we propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans and is resistant to denoising techniques. Besides, we optimize the noise transmission strategy for broader coverage and implement a hardware prototype of our system. Experimental results show that our system can reduce the recognition accuracy of recordings to below 50\% under all tested speech recognition systems, which is much better than existing solutions.

Related papers

TAPS: Throat and Acoustic Paired Speech Dataset for Deep Learning-Based Speech Enhancement [0.0]
Throat microphones provide a solution with their noise-suppressing properties, reducing the noise while recording speech. High-frequency information is attenuated as sound waves pass through skin and tissue, reducing speech clarity. Recent deep learning approaches have shown promise in enhancing throat microphone recordings, but further progress is constrained by the absence of standardized dataset. We introduce a throat and acoustic paired speech dataset (TAPS), a collection of paired utterances recorded from 60 native Korean speakers using throat and acoustic microphones.
arXiv Detail & Related papers (2025-02-17T06:29:11Z)
VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect [2.417762825674103]
rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear. In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance.
arXiv Detail & Related papers (2025-02-14T17:43:01Z)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits [82.8859060022651]
We introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization.
arXiv Detail & Related papers (2025-01-07T14:17:47Z)
Safeguarding Voice Privacy: Harnessing Near-Ultrasonic Interference To Protect Against Unauthorized Audio Recording [0.0]
This paper investigates the susceptibility of automatic speech recognition (ASR) algorithms to interference from near-ultrasonic noise. We expose a critical vulnerability in the most common microphones used in modern voice-activated devices, which inadvertently demodulate near-ultrasonic frequencies into the audible spectrum. Our findings highlight the need to develop robust countermeasures to protect voice-activated systems from malicious exploitation of this vulnerability.
arXiv Detail & Related papers (2024-04-07T00:49:19Z)
Proactive Detection of Voice Cloning with Localized Watermarking [50.13539630769929]
We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
arXiv Detail & Related papers (2024-01-30T18:56:22Z)
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms [8.946335367620698]
This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications.
arXiv Detail & Related papers (2023-09-05T17:04:09Z)
SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks [18.968402215723]
A system to detect a user's unvoiced utterance is proposed. Our proposed system recognizes the utterance contents without the user's uttering voice. We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.
arXiv Detail & Related papers (2023-03-03T07:46:35Z)
Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z)
SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech [10.354590276508283]
Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. We propose a speaker verification system, SUPERVOICE, that uses a two-stream architecture with a feature fusion mechanism to generate distinctive speaker models.
arXiv Detail & Related papers (2022-05-28T18:00:50Z)
Disappeared Command: Spoofing Attack On Automatic Speech Recognition Systems with Sound Masking [2.9308762189250746]
Voice interfaces are becoming more and more widely used as input for many applications and smart devices. DNN is easily disturbed by slight disturbances and makes false recognition, which is extremely dangerous for intelligent voice applications controlled by voice.
arXiv Detail & Related papers (2022-04-19T16:26:34Z)
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments. We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z)
Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants. This paper proposes a Speech Enhancement model adapted to the task of WUW detection. It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z)
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition [60.462770498366524]
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user. We show that such a model can be quantized as a 8-bit integer model and run in realtime.
arXiv Detail & Related papers (2020-09-09T14:26:56Z)
TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices [71.68436132514542]
We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge. To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
arXiv Detail & Related papers (2020-08-10T16:34:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.