V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time
Voice Anonymization
- URL: http://arxiv.org/abs/2210.15140v1
- Date: Thu, 27 Oct 2022 02:58:57 GMT
- Title: V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time
Voice Anonymization
- Authors: Jiangyi Deng (1), Fei Teng (1), Yanjiao Chen (1), Xiaofu Chen (2),
Zhaohui Wang (2), Wenyuan Xu (1) ((1) Zhejiang University, (2) Wuhan
University)
- Abstract summary: We develop a voice anonymization system, named V-Cloak, which attains real-time voice anonymization.
Our designed anonymizer features a one-shot generative model that modulates the features of the original audio at different frequency levels.
Experiment results confirm that V-Cloak outperforms five baselines in terms of anonymity performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice data generated on instant messaging or social media applications
contains unique user voiceprints that may be abused by malicious adversaries
for identity inference or identity theft. Existing voice anonymization
techniques, e.g., signal processing and voice conversion/synthesis, suffer from
degradation of perceptual quality. In this paper, we develop a voice
anonymization system, named V-Cloak, which attains real-time voice
anonymization while preserving the intelligibility, naturalness and timbre of
the audio. Our designed anonymizer features a one-shot generative model that
modulates the features of the original audio at different frequency levels. We
train the anonymizer with a carefully-designed loss function. Apart from the
anonymity loss, we further incorporate the intelligibility loss and the
psychoacoustics-based naturalness loss. The anonymizer can realize untargeted
and targeted anonymization to achieve the anonymity goals of unidentifiability
and unlinkability.
We have conducted extensive experiments on four datasets, i.e., LibriSpeech
(English), AISHELL (Chinese), CommonVoice (French) and CommonVoice (Italian),
five Automatic Speaker Verification (ASV) systems (including two DNN-based, two
statistical and one commercial ASV), and eleven Automatic Speech Recognition
(ASR) systems (for different languages). Experiment results confirm that
V-Cloak outperforms five baselines in terms of anonymity performance. We also
demonstrate that V-Cloak trained only on the VoxCeleb1 dataset against
ECAPA-TDNN ASV and DeepSpeech2 ASR has transferable anonymity against other
ASVs and cross-language intelligibility for other ASRs. Furthermore, we verify
the robustness of V-Cloak against various de-noising techniques and adaptive
attacks. Hopefully, V-Cloak may provide a cloak for us in a prism world.
Related papers
- Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses [0.08155575318208629]
Speech anonymization needs to obscure a speaker's identity while retaining critical information for subsequent tasks.
Our research underscores the importance of loss functions inspired by the human auditory system.
Our proposed loss functions are model-agnostic, incorporating handcrafted and deep learning-based features to effectively capture quality representations.
arXiv Detail & Related papers (2024-10-20T20:33:44Z) - Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion [5.483488375189695]
Face-based Voice Conversion (FVC) is a novel task that leverages facial images to generate the target speaker's voice style.
Previous work has two shortcomings: (1) suffering from obtaining facial embeddings that are well-aligned with the speaker's voice identity information, and (2) inadequacy in decoupling content and speaker identity information from the audio input.
We present a novel FVC method, Identity-Disentanglement Face-based Voice Conversion (ID-FaceVC), which overcomes the above two limitations.
arXiv Detail & Related papers (2024-09-01T11:51:18Z) - Anonymizing Speech: Evaluating and Designing Speaker Anonymization
Techniques [1.2691047660244337]
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.
This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization.
arXiv Detail & Related papers (2023-08-05T16:14:17Z) - On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection [13.227360396362707]
There is growing interest in voice anonymization to preserve speaker privacy and identity.
For affective computing and disease monitoring applications, however, the para-linguistic content may be more critical.
We test three anonymization methods and their impact on five different state-of-the-art COVID-19 diagnostic systems.
arXiv Detail & Related papers (2023-04-05T01:09:58Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Speaker anonymisation using the McAdams coefficient [19.168733328810962]
This paper reports an approach to anonymisation that, unlike other current approaches, requires no training data.
The proposed solution uses the McAdams coefficient to transform the spectral envelope of speech signals.
Results show that random, optimised transformations can outperform competing solutions in terms of anonymisation.
arXiv Detail & Related papers (2020-11-02T17:07:17Z) - Design Choices for X-vector Based Speaker Anonymization [48.46018902334472]
We present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge.
Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.
arXiv Detail & Related papers (2020-05-18T11:32:14Z) - Many-to-Many Voice Transformer Network [55.17770019619078]
This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework.
It enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech.
arXiv Detail & Related papers (2020-05-18T04:02:08Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.