Proactive Detection of Voice Cloning with Localized Watermarking
- URL: http://arxiv.org/abs/2401.17264v2
- Date: Thu, 6 Jun 2024 17:48:28 GMT
- Title: Proactive Detection of Voice Cloning with Localized Watermarking
- Authors: Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar,
- Abstract summary: We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.
AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level.
AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
- Score: 50.13539630769929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.
Related papers
- IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding [29.89341878606415]
In this paper, we design a dual-embedding watermarking model for efficient locating.
Experiments show that the proposed model, IDEAW, can withstand various attacks with higher capacity and more efficient locating ability compared to existing methods.
arXiv Detail & Related papers (2024-09-29T09:32:54Z) - Attention-Based Efficient Breath Sound Removal in Studio Audio Recordings [0.0]
We present an innovative, parameter-efficient model for the automatic detection and eradication of non-speech vocal sounds.
Our proposed model addresses limitations by offering a streamlined process and superior accuracy, achieved through the application of advanced deep learning techniques.
Our model not only conserves precious time for sound engineers but also enhances the quality and consistency of audio production.
arXiv Detail & Related papers (2024-09-08T02:11:33Z) - Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles [48.208214762257136]
It employs two models: a lightweight on-device model for real-time processing of the audio stream and a verification model on the server-side.
To protect privacy, audio features are sent to the cloud instead of raw audio.
arXiv Detail & Related papers (2023-10-17T16:22:18Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Evaluating Automatic Speech Recognition in an Incremental Setting [0.7734726150561086]
We systematically evaluate six speech recognizers using metrics including word error rate, latency, and the number of updates to already recognized words on English test data.
We find that, generally, local recognizers are faster and require fewer updates than cloud-based recognizers.
arXiv Detail & Related papers (2023-02-23T14:22:40Z) - Combining Automatic Speaker Verification and Prosody Analysis for
Synthetic Speech Detection [15.884911752869437]
We present a novel approach for synthetic speech detection, exploiting the combination of two high-level semantic properties of the human voice.
On one side, we focus on speaker identity cues and represent them as speaker embeddings extracted using a state-of-the-art method for the automatic speaker verification task.
On the other side, voice prosody, intended as variations in rhythm, pitch or accent in speech, is extracted through a specialized encoder.
arXiv Detail & Related papers (2022-10-31T11:03:03Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.