XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
- URL: http://arxiv.org/abs/2502.04230v2
- Date: Fri, 07 Feb 2025 20:11:12 GMT
- Title: XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
- Authors: Yixin Liu, Lie Lu, Jihui Jin, Lichao Sun, Andrea Fanelli,
- Abstract summary: Cross-Attention Robust Audio Watermark (XAttnMark)
This paper introduces Cross-Attention Robust Audio Watermark (XAttnMark), which bridges the gap by leveraging partial parameter sharing between the generator and the detector.
We propose a psychoacoustic-aligned temporal-frequency masking loss that captures fine-grained auditory masking effects, enhancing watermark imperceptibility.
- Score: 15.216472445154064
- License:
- Abstract: The rapid proliferation of generative audio synthesis and editing technologies has raised significant concerns about copyright infringement, data provenance, and the spread of misinformation through deepfake audio. Watermarking offers a proactive solution by embedding imperceptible, identifiable, and traceable marks into audio content. While recent neural network-based watermarking methods like WavMark and AudioSeal have improved robustness and quality, they struggle to achieve both robust detection and accurate attribution simultaneously. This paper introduces Cross-Attention Robust Audio Watermark (XAttnMark), which bridges this gap by leveraging partial parameter sharing between the generator and the detector, a cross-attention mechanism for efficient message retrieval, and a temporal conditioning module for improved message distribution. Additionally, we propose a psychoacoustic-aligned temporal-frequency masking loss that captures fine-grained auditory masking effects, enhancing watermark imperceptibility. Our approach achieves state-of-the-art performance in both detection and attribution, demonstrating superior robustness against a wide range of audio transformations, including challenging generative editing with strong editing strength. The project webpage is available at https://liuyixin-louis.github.io/xattnmark/.
Related papers
- IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding [29.89341878606415]
In this paper, we design a dual-embedding watermarking model for efficient locating.
Experiments show that the proposed model, IDEAW, can withstand various attacks with higher capacity and more efficient locating ability compared to existing methods.
arXiv Detail & Related papers (2024-09-29T09:32:54Z) - Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis [9.48476556434306]
This paper extends the channel augmentation to work with non-differentiable traditional audio codecs and neural audio codecs.
Listening tests demonstrate collaborative watermarking incurs negligible perceptual degradation with high codecs or DAC at 8kbps.
arXiv Detail & Related papers (2024-09-20T10:33:17Z) - AudioMarkBench: Benchmarking Robustness of Audio Watermarking [38.25450275151647]
We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery.
Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions.
arXiv Detail & Related papers (2024-06-11T06:18:29Z) - Proactive Detection of Voice Cloning with Localized Watermarking [50.13539630769929]
We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.
AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level.
AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
arXiv Detail & Related papers (2024-01-30T18:56:22Z) - Invisible Watermarking for Audio Generation Diffusion Models [11.901028740065662]
This paper presents the first watermarking technique applied to audio diffusion models trained on mel-spectrograms.
Our model excels not only in benign audio generation, but also incorporates an invisible watermarking trigger mechanism for model verification.
arXiv Detail & Related papers (2023-09-22T20:10:46Z) - WavMark: Watermarking for Audio Generation [70.65175179548208]
This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet.
The watermark is imperceptible to human senses and exhibits strong resilience against various attacks.
It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection.
arXiv Detail & Related papers (2023-08-24T13:17:35Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Tree-Ring Watermarks: Fingerprints for Diffusion Images that are
Invisible and Robust [55.91987293510401]
Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content.
We introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs.
Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed.
arXiv Detail & Related papers (2023-05-31T17:00:31Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.