Multi-bit Audio Watermarking
- URL: http://arxiv.org/abs/2510.01968v1
- Date: Thu, 02 Oct 2025 12:41:01 GMT
- Title: Multi-bit Audio Watermarking
- Authors: Luca A. Lanzendörfer, Kyle Fearne, Florian Grötschla, Roger Wattenhofer,
- Abstract summary: We present Timbru, a post-hoc audio watermarking model that achieves state-of-the-art robustness and imperceptibility trade-offs without training an embedder-detector model.<n>Our approach attains the best average bit error rates, while preserving perceptual quality, demonstrating an efficient, dataset-free path to imperceptible audio watermarking.
- Score: 38.40457780873775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Timbru, a post-hoc audio watermarking model that achieves state-of-the-art robustness and imperceptibility trade-offs without training an embedder-detector model. Given any 44.1 kHz stereo music snippet, our method performs per-audio gradient optimization to add imperceptible perturbations in the latent space of a pretrained audio VAE, guided by a combined message and perceptual loss. The watermark can then be extracted using a pretrained CLAP model. We evaluate 16-bit watermarking on MUSDB18-HQ against AudioSeal, WavMark, and SilentCipher across common filtering, noise, compression, resampling, cropping, and regeneration attacks. Our approach attains the best average bit error rates, while preserving perceptual quality, demonstrating an efficient, dataset-free path to imperceptible audio watermarking.
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal [12.931496380963802]
A key defense against the misuse of AI-generated audio is by watermarking it, so that it can be easily distinguished from genuine audio.<n>Previous watermark removal schemes either assume impractical knowledge of the watermarks they are designed to remove or are computationally expensive.<n>We introduce HarmonicAttack, an efficient audio watermark removal method that only requires the basic ability to generate the watermarks.
arXiv Detail & Related papers (2025-11-26T16:51:20Z) - AWARE: Audio Watermarking with Adversarial Resistance to Edits [0.0]
AWARE (Audio Watermarking with Adrial Resistance to Edits) is an approach that avoids reliance on attack-versa stacks and handcrafted differentiable distortions.<n> Embedding is obtained via adversarial optimization in the time-frequency domain under a level-proportional budget.<n>AWARE attains high audio quality and speech intelligibility (PESQ/STOI) and consistently low BER across various audio edits.
arXiv Detail & Related papers (2025-10-20T13:10:52Z) - A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs? [21.111812193733982]
RAW-Bench is a benchmark for evaluating deep learning-based audio watermarking methods.<n>We introduce a comprehensive audio attack pipeline with various distortions such as compression, background noise, and reverberation.<n>We find that specific distortions, such as polarity inversion, time stretching, or reverb, seriously affect certain methods.
arXiv Detail & Related papers (2025-05-26T08:21:58Z) - XAttnMark: Learning Robust Audio Watermarking with Cross-Attention [15.216472445154064]
Cross-Attention Robust Audio Watermark (XAttnMark)<n>This paper introduces Cross-Attention Robust Audio Watermark (XAttnMark), which bridges the gap by leveraging partial parameter sharing between the generator and the detector.<n>We propose a psychoacoustic-aligned temporal-frequency masking loss that captures fine-grained auditory masking effects, enhancing watermark imperceptibility.
arXiv Detail & Related papers (2025-02-06T17:15:08Z) - AudioMarkBench: Benchmarking Robustness of Audio Watermarking [38.25450275151647]
We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery.
Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions.
arXiv Detail & Related papers (2024-06-11T06:18:29Z) - Proactive Detection of Voice Cloning with Localized Watermarking [50.13539630769929]
We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.
AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level.
AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
arXiv Detail & Related papers (2024-01-30T18:56:22Z) - WavMark: Watermarking for Audio Generation [70.65175179548208]
This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet.
The watermark is imperceptible to human senses and exhibits strong resilience against various attacks.
It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection.
arXiv Detail & Related papers (2023-08-24T13:17:35Z) - AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework.
It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.