VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
- URL: http://arxiv.org/abs/2502.10329v1
- Date: Fri, 14 Feb 2025 17:43:01 GMT
- Title: VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
- Authors: Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu,
- Abstract summary: rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields.
We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear.
In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance.
- Score: 2.417762825674103
- License:
- Abstract: The rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. While these developments have led to notable progress, they have also raised concerns about the misuse of AI VC technology, causing economic losses and negative public perceptions. To address this challenge, this study focuses on creating active defense mechanisms against AI VC systems. We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear, thereby forming systematic fragments to prevent voice cloning. This approach protects the voice without compromising its quality. In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance, achieving a 500\% increase in generation speed while maintaining interference effectiveness. Unlike audio watermarking techniques, which focus on post-detection, our method offers preemptive defense, reducing implementation costs and enhancing feasibility. Extensive experiments using the Zhvoice and VCTK Corpus datasets show that our AI-cloned speech defense system performs excellently in automatic speaker verification (ASV) tests while preserving the integrity of the protected audio.
Related papers
- Mitigating Unauthorized Speech Synthesis for Voice Protection [7.1578783467799]
malicious voice exploitation has brought huge hazards in our daily lives.
It is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints.
We devise Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples.
arXiv Detail & Related papers (2024-10-28T05:16:37Z) - Can DeepFake Speech be Reliably Detected? [17.10792531439146]
This work presents the first systematic study of active malicious attacks against state-of-the-art open-source speech detectors.
The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.
arXiv Detail & Related papers (2024-10-09T06:13:48Z) - VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment [101.2489492032816]
VALL-E R is a robust and efficient zero-shot Text-to-Speech system.
This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia.
arXiv Detail & Related papers (2024-06-12T04:09:44Z) - Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege [26.3587130339825]
We propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans.
Our system can reduce the recognition accuracy of recordings to below 50% under all tested speech recognition systems.
arXiv Detail & Related papers (2024-01-28T16:56:56Z) - A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems? [13.115517847161428]
AI-driven audio attacks have revealed new security vulnerabilities in voice control systems.
Our study endeavors to assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks.
Our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats.
arXiv Detail & Related papers (2023-12-10T21:51:13Z) - High-Quality Automatic Voice Over with Accurate Alignment: Supervision
through Self-Supervised Discrete Speech Units [69.06657692891447]
We propose a novel AVO method leveraging the learning objective of self-supervised discrete speech unit prediction.
Experimental results show that our proposed method achieves remarkable lip-speech synchronization and high speech quality.
arXiv Detail & Related papers (2023-06-29T15:02:22Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Practical Attacks on Voice Spoofing Countermeasures [3.388509725285237]
We show how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form.
Our results call into question the security of modern voice authentication systems in light of the real threat of attackers bypassing these measures.
arXiv Detail & Related papers (2021-07-30T14:07:49Z) - Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant
Environments [76.98764900754111]
Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker.
We propose Voicy, a new VC framework particularly tailored for noisy speech.
Our method, which is inspired by the de-noising auto-encoders framework, is comprised of four encoders (speaker, content, phonetic and acoustic-ASR) and one decoder.
arXiv Detail & Related papers (2021-06-16T15:47:06Z) - Multi-task self-supervised learning for Robust Speech Recognition [75.11748484288229]
This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments.
We employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances.
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
arXiv Detail & Related papers (2020-01-25T00:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.