Related papers: Real-Time Neural Voice Camouflage

Real-Time Neural Voice Camouflage

URL: http://arxiv.org/abs/2112.07076v1
Date: Tue, 14 Dec 2021 00:27:44 GMT
Title: Real-Time Neural Voice Camouflage
Authors: Mia Chiquier, Chengzhi Mao, Carl Vondrick
Abstract summary: We propose a method to camouflage a person's voice over-the-air from automatic speech recognition systems. Standard adversarial attacks are not effective in real-time streaming situations. We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future.
Score: 23.171336558901118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping. We propose a method to camouflage a person's voice over-the-air from these systems without inconveniencing the conversation between people in the room. Standard adversarial attacks are not effective in real-time streaming situations because the characteristics of the signal will have changed by the time the attack is executed. We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future. Under real-time constraints, our method jams the established speech recognition system DeepSpeech 4.17x more than baselines as measured through word error rate, and 7.27x more as measured through character error rate. We furthermore demonstrate our approach is practically effective in realistic environments over physical distances.

Related papers

Beyond the Voice: Inertial Sensing of Mouth Motion for High Security Speech Verification [0.34998703934432673]
We present a second authentication factor that combines acoustic evidence with the unique motion patterns of a speaker's lower face.<n>Our system records a distinct motion signature with strong discriminative power across individuals.<n>We discuss specific use cases where this second line of defense could provide tangible security benefits to voice authentication systems.
arXiv Detail & Related papers (2025-10-16T22:26:18Z)
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance [66.74042564585942]
MOSS-Speech is a true speech-to-speech large language model that directly understands and generates speech without relying on text guidance.<n>Our work establishes a new paradigm for expressive and efficient end-to-end speech interaction.
arXiv Detail & Related papers (2025-10-01T04:32:37Z)
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance. We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information. Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z)
Time-Aware Face Anti-Spoofing with Rotation Invariant Local Binary Patterns and Deep Learning [50.79277723970418]
imitation attacks can lead to erroneous identification and subsequent authentication of attackers. Similar to face recognition, imitation attacks can also be detected with Machine Learning. We propose a novel approach that promises high classification accuracy by combining previously unused features with time-aware deep learning strategies.
arXiv Detail & Related papers (2024-08-27T07:26:10Z)
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer [8.948537516293328]
We propose an attack on Automatic Speech Recognition (ASR) systems based on user-customized style transfer. Our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks.
arXiv Detail & Related papers (2024-05-15T16:05:24Z)
Histogram Layer Time Delay Neural Networks for Passive Sonar Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification. The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z)
Adversarial Representation Learning for Robust Privacy Preservation in Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings. We propose a novel adversarial training method for learning representations of audio recordings. The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z)
Attack on practical speaker verification system using universal adversarial perturbations [20.38185341318529]
This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker. A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition.
arXiv Detail & Related papers (2021-05-19T09:43:34Z)
Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo. Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation. Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z)
VenoMave: Targeted Poisoning Against Speech Recognition [30.448709704880518]
VENOMAVE is the first training-time poisoning attack against speech recognition. We evaluate our attack on two datasets: TIDIGITS and Speech Commands.
arXiv Detail & Related papers (2020-10-21T00:30:08Z)
Towards Resistant Audio Adversarial Examples [0.0]
We find that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting. We devise an approach to mitigate this flaw and find that our method improves generation of adversarial examples with varying offsets.
arXiv Detail & Related papers (2020-10-14T16:04:02Z)
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition [60.462770498366524]
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user. We show that such a model can be quantized as a 8-bit integer model and run in realtime.
arXiv Detail & Related papers (2020-09-09T14:26:56Z)
Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks. We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames. The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.