SyncGuard: Robust Audio Watermarking Capable of Countering Desynchronization Attacks
- URL: http://arxiv.org/abs/2508.17121v2
- Date: Mon, 01 Sep 2025 05:12:55 GMT
- Title: SyncGuard: Robust Audio Watermarking Capable of Countering Desynchronization Attacks
- Authors: Zhenliang Gan, Xiaoxiao Hu, Sheng Li, Zhenxing Qian, Xinpeng Zhang,
- Abstract summary: We propose a learning-based scheme named SyncGuard to address these challenges.<n>Specifically, we design a frame-wise broadcast embedding strategy to embed the watermark in arbitrary-length audio.<n>To further enhance robustness, we introduce a meticulously designed distortion layer.
- Score: 41.25345809241139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio watermarking has been widely applied in copyright protection and source tracing. However, due to the inherent characteristics of audio signals, watermark localization and resistance to desynchronization attacks remain significant challenges. In this paper, we propose a learning-based scheme named SyncGuard to address these challenges. Specifically, we design a frame-wise broadcast embedding strategy to embed the watermark in arbitrary-length audio, enhancing time-independence and eliminating the need for localization during watermark extraction. To further enhance robustness, we introduce a meticulously designed distortion layer. Additionally, we employ dilated residual blocks in conjunction with dilated gated blocks to effectively capture multi-resolution time-frequency features. Extensive experimental results show that SyncGuard efficiently handles variable-length audio segments, outperforms state-of-the-art methods in robustness against various attacks, and delivers superior auditory quality.
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - AWARE: Audio Watermarking with Adversarial Resistance to Edits [0.0]
AWARE (Audio Watermarking with Adrial Resistance to Edits) is an approach that avoids reliance on attack-versa stacks and handcrafted differentiable distortions.<n> Embedding is obtained via adversarial optimization in the time-frequency domain under a level-proportional budget.<n>AWARE attains high audio quality and speech intelligibility (PESQ/STOI) and consistently low BER across various audio edits.
arXiv Detail & Related papers (2025-10-20T13:10:52Z) - An Ensemble Framework for Unbiased Language Model Watermarking [60.99969104552168]
We propose ENS, a novel ensemble framework that enhances the detectability and robustness of unbiased watermarks.<n>ENS sequentially composes multiple independent watermark instances, each governed by a distinct key, to amplify the watermark signal.<n> Empirical evaluations show that ENS substantially reduces the number of tokens needed for reliable detection and increases resistance to smoothing and paraphrasing attacks.
arXiv Detail & Related papers (2025-09-28T19:37:44Z) - OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization [66.69924980864053]
We propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process.<n> OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations.<n> Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.
arXiv Detail & Related papers (2025-08-29T15:50:59Z) - Video Signature: In-generation Watermarking for Latent Video Diffusion Models [18.347424463264606]
Video Signature (VID SIG) is an in-generation watermarking method for latent video diffusion models.<n>We achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers.<n> Experimental results show that VID SIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency.
arXiv Detail & Related papers (2025-05-31T17:43:54Z) - Protecting Your Voice: Temporal-aware Robust Watermarking [10.883912837253794]
textbfunderlinetemporal-aware textbfunderlinerobtextbfunderlineust wattextbfunderlineermarking (emphTrue)<n> integrated content-driven encoder is designed for watermarked waveform reconstruction.<n> temporal-aware gated convolutional network is meticulously designed to bit-wise recover the watermark.
arXiv Detail & Related papers (2025-04-21T03:23:10Z) - Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling [81.37449968164692]
We propose Synchronized Coupled Sampling (SynCoS), a novel inference framework that synchronizes denoising paths across the entire video.<n>Our approach combines two complementary sampling strategies, which ensure seamless local transitions and enforce global coherence.<n>Extensive experiments show that SynCoS significantly improves multi-event long video generation, achieving smoother transitions and superior long-range coherence.
arXiv Detail & Related papers (2025-03-11T16:43:45Z) - XAttnMark: Learning Robust Audio Watermarking with Cross-Attention [15.216472445154064]
Cross-Attention Robust Audio Watermark (XAttnMark)<n>This paper introduces Cross-Attention Robust Audio Watermark (XAttnMark), which bridges the gap by leveraging partial parameter sharing between the generator and the detector.<n>We propose a psychoacoustic-aligned temporal-frequency masking loss that captures fine-grained auditory masking effects, enhancing watermark imperceptibility.
arXiv Detail & Related papers (2025-02-06T17:15:08Z) - Proactive Detection of Voice Cloning with Localized Watermarking [50.13539630769929]
We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.
AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level.
AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
arXiv Detail & Related papers (2024-01-30T18:56:22Z) - A DTCWT-SVD Based Video Watermarking resistant to frame rate conversion [27.591506014201546]
We present a new video watermarking based on joint Dual-Tree Cosine Wavelet Transformation (DTCWT) and Singular Value Decomposition (SVD)
We perform group-level watermarking that includes moderate temporal redundancy to resist temporal desynchronization attacks.
arXiv Detail & Related papers (2022-06-02T15:20:52Z) - Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual
Speech Separation [73.1652905564163]
We address the problem of separating individual speech signals from videos using audio-visual neural processing.
Most conventional approaches utilize frame-wise matching criteria to extract shared information between co-occurring audio and video.
We propose a cross-modal affinity network (CaffNet) that learns global correspondence as well as locally-varying affinities between audio and visual streams.
arXiv Detail & Related papers (2021-03-25T15:39:12Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Adversarially Robust Streaming Algorithms via Differential Privacy [68.3608356069755]
A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously.
We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy.
arXiv Detail & Related papers (2020-04-13T14:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.