Related papers: VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect

VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect

URL: http://arxiv.org/abs/2502.10329v1
Date: Fri, 14 Feb 2025 17:43:01 GMT
Title: VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
Authors: Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu,
Abstract summary: rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields.<n>We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear.<n>In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance.
Score: 2.417762825674103
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. While these developments have led to notable progress, they have also raised concerns about the misuse of AI VC technology, causing economic losses and negative public perceptions. To address this challenge, this study focuses on creating active defense mechanisms against AI VC systems. We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear, thereby forming systematic fragments to prevent voice cloning. This approach protects the voice without compromising its quality. In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance, achieving a 500\% increase in generation speed while maintaining interference effectiveness. Unlike audio watermarking techniques, which focus on post-detection, our method offers preemptive defense, reducing implementation costs and enhancing feasibility. Extensive experiments using the Zhvoice and VCTK Corpus datasets show that our AI-cloned speech defense system performs excellently in automatic speaker verification (ASV) tests while preserving the integrity of the protected audio.

Related papers

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks [68.41885995006643]
We study the first systematic evaluation of protective perturbations against voice cloning (VC) under realistic threat models.<n>Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models.<n>We propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution.
arXiv Detail & Related papers (2025-07-03T13:30:58Z)
VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning [14.907575859145423]
Diffusion Models (DMs) have achieved remarkable success in realistic voice cloning (VC)<n>DMs have been proven incompatible with proactive defenses due to intricate generative mechanisms of diffusion.<n>We introduce VoiceCloak, a multi-dimensional proactive defense framework with the goal of obfuscating speaker identity and degrading quality in potential unauthorized VC.
arXiv Detail & Related papers (2025-05-18T09:58:48Z)
SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis [8.590034271906289]
Speech synthesis technology has brought great convenience, while the widespread usage of realistic deepfake audio has triggered hazards. Malicious adversaries may unauthorizedly collect victims' speeches and clone a similar voice for illegal exploitation. We propose a framework, textittextbfSafeSpeech, which protects the users' audio before uploading by embedding imperceptible perturbations on original speeches.
arXiv Detail & Related papers (2025-04-14T03:21:23Z)
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks [59.87470192277124]
This paper explores methods of compromising speech translation systems through imperceptible audio manipulations. We present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation. Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly. The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems.
arXiv Detail & Related papers (2025-03-02T16:38:16Z)
Mitigating Unauthorized Speech Synthesis for Voice Protection [7.1578783467799]
malicious voice exploitation has brought huge hazards in our daily lives. It is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints. We devise Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples.
arXiv Detail & Related papers (2024-10-28T05:16:37Z)
Can DeepFake Speech be Reliably Detected? [17.10792531439146]
This work presents the first systematic study of active malicious attacks against state-of-the-art open-source speech detectors. The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.
arXiv Detail & Related papers (2024-10-09T06:13:48Z)
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment [101.2489492032816]
VALL-E R is a robust and efficient zero-shot Text-to-Speech system. This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia.
arXiv Detail & Related papers (2024-06-12T04:09:44Z)
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege [26.3587130339825]
We propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans. Our system can reduce the recognition accuracy of recordings to below 50% under all tested speech recognition systems.
arXiv Detail & Related papers (2024-01-28T16:56:56Z)
A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems? [13.115517847161428]
AI-driven audio attacks have revealed new security vulnerabilities in voice control systems. Our study endeavors to assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks. Our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats.
arXiv Detail & Related papers (2023-12-10T21:51:13Z)
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units [69.06657692891447]
We propose a novel AVO method leveraging the learning objective of self-supervised discrete speech unit prediction. Experimental results show that our proposed method achieves remarkable lip-speech synchronization and high speech quality.
arXiv Detail & Related papers (2023-06-29T15:02:22Z)
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks. We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Practical Attacks on Voice Spoofing Countermeasures [3.388509725285237]
We show how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form. Our results call into question the security of modern voice authentication systems in light of the real threat of attackers bypassing these measures.
arXiv Detail & Related papers (2021-07-30T14:07:49Z)
Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments [76.98764900754111]
Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. We propose Voicy, a new VC framework particularly tailored for noisy speech. Our method, which is inspired by the de-noising auto-encoders framework, is comprised of four encoders (speaker, content, phonetic and acoustic-ASR) and one decoder.
arXiv Detail & Related papers (2021-06-16T15:47:06Z)
Multi-task self-supervised learning for Robust Speech Recognition [75.11748484288229]
This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. We employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
arXiv Detail & Related papers (2020-01-25T00:24:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.