Real-Time Neural Voice Camouflage
- URL: http://arxiv.org/abs/2112.07076v1
- Date: Tue, 14 Dec 2021 00:27:44 GMT
- Title: Real-Time Neural Voice Camouflage
- Authors: Mia Chiquier, Chengzhi Mao, Carl Vondrick
- Abstract summary: We propose a method to camouflage a person's voice over-the-air from automatic speech recognition systems.
Standard adversarial attacks are not effective in real-time streaming situations.
We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future.
- Score: 23.171336558901118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic speech recognition systems have created exciting possibilities for
applications, however they also enable opportunities for systematic
eavesdropping. We propose a method to camouflage a person's voice over-the-air
from these systems without inconveniencing the conversation between people in
the room. Standard adversarial attacks are not effective in real-time streaming
situations because the characteristics of the signal will have changed by the
time the attack is executed. We introduce predictive attacks, which achieve
real-time performance by forecasting the attack that will be the most effective
in the future. Under real-time constraints, our method jams the established
speech recognition system DeepSpeech 4.17x more than baselines as measured
through word error rate, and 7.27x more as measured through character error
rate. We furthermore demonstrate our approach is practically effective in
realistic environments over physical distances.
Related papers
- Beyond the Voice: Inertial Sensing of Mouth Motion for High Security Speech Verification [0.34998703934432673]
We present a second authentication factor that combines acoustic evidence with the unique motion patterns of a speaker's lower face.<n>Our system records a distinct motion signature with strong discriminative power across individuals.<n>We discuss specific use cases where this second line of defense could provide tangible security benefits to voice authentication systems.
arXiv Detail & Related papers (2025-10-16T22:26:18Z) - MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance [66.74042564585942]
MOSS-Speech is a true speech-to-speech large language model that directly understands and generates speech without relying on text guidance.<n>Our work establishes a new paradigm for expressive and efficient end-to-end speech interaction.
arXiv Detail & Related papers (2025-10-01T04:32:37Z) - Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - Time-Aware Face Anti-Spoofing with Rotation Invariant Local Binary Patterns and Deep Learning [50.79277723970418]
imitation attacks can lead to erroneous identification and subsequent authentication of attackers.
Similar to face recognition, imitation attacks can also be detected with Machine Learning.
We propose a novel approach that promises high classification accuracy by combining previously unused features with time-aware deep learning strategies.
arXiv Detail & Related papers (2024-08-27T07:26:10Z) - Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer [8.948537516293328]
We propose an attack on Automatic Speech Recognition (ASR) systems based on user-customized style transfer.
Our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks.
arXiv Detail & Related papers (2024-05-15T16:05:24Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - Adversarial Representation Learning for Robust Privacy Preservation in
Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings.
We propose a novel adversarial training method for learning representations of audio recordings.
The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z) - Attack on practical speaker verification system using universal
adversarial perturbations [20.38185341318529]
This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker.
A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition.
arXiv Detail & Related papers (2021-05-19T09:43:34Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - VenoMave: Targeted Poisoning Against Speech Recognition [30.448709704880518]
VENOMAVE is the first training-time poisoning attack against speech recognition.
We evaluate our attack on two datasets: TIDIGITS and Speech Commands.
arXiv Detail & Related papers (2020-10-21T00:30:08Z) - Towards Resistant Audio Adversarial Examples [0.0]
We find that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting.
We devise an approach to mitigate this flaw and find that our method improves generation of adversarial examples with varying offsets.
arXiv Detail & Related papers (2020-10-14T16:04:02Z) - VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device
Speech Recognition [60.462770498366524]
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user.
We show that such a model can be quantized as a 8-bit integer model and run in realtime.
arXiv Detail & Related papers (2020-09-09T14:26:56Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.