VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection
- URL: http://arxiv.org/abs/2601.22556v1
- Date: Fri, 30 Jan 2026 04:51:50 GMT
- Title: VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection
- Authors: Weizhi Liu, Yue Li, Zhaoxia Yin,
- Abstract summary: VocBulwark is a framework that freezes generative model parameters to preserve perceptual quality.<n>VocBulwark achieves high-capacity and high-fidelity watermarking, offering robust defense against complex practical scenarios.
- Score: 10.244226665349483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generated speech achieves human-level naturalness but escalates security risks of misuse. However, existing watermarking methods fail to reconcile fidelity with robustness, as they rely either on simple superposition in the noise space or on intrusive alterations to model weights. To bridge this gap, we propose VocBulwark, an additional-parameter injection framework that freezes generative model parameters to preserve perceptual quality. Specifically, we design a Temporal Adapter to deeply entangle watermarks with acoustic attributes, synergizing with a Coarse-to-Fine Gated Extractor to resist advanced attacks. Furthermore, we develop an Accuracy-Guided Optimization Curriculum that dynamically orchestrates gradient flow to resolve the optimization conflict between fidelity and robustness. Comprehensive experiments demonstrate that VocBulwark achieves high-capacity and high-fidelity watermarking, offering robust defense against complex practical scenarios, with resilience to Codec regenerations and variable-length manipulations.
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z) - TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution [3.1682080884953736]
We propose a generative textbfspeech wattextbfermarking method (TriniMark) for authenticating the generated content.<n>We first design a structure-lightweight watermark encoder that embeds watermarks into the time-domain features of speech.<n>A temporal-aware gated convolutional network is meticulously designed in the watermark decoder for bit-wise watermark recovery.
arXiv Detail & Related papers (2025-04-29T08:23:28Z) - Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z) - SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation [3.1682080884953736]
We propose generative watermarking method that integrates parameter-efficient fine-tuning with speech watermarking.<n>The proposed method ensures high-fidelity watermarked speech even at a large capacity of 2000 bps.<n>It surpasses other state-of-the-art methods by nearly 23% in resisting time-stretching attacks.
arXiv Detail & Related papers (2025-04-21T11:43:36Z) - Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models [66.54457339638004]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models.<n>We propose a diffusion model watermarking method tailored for real-world deployment.<n>Gaussian Shading++ not only maintains performance losslessness but also outperforms existing methods in terms of robustness.
arXiv Detail & Related papers (2025-04-21T11:18:16Z) - GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis [37.065509936285466]
This paper proposes the generative robust audio watermarking method (Groot)
In this paradigm, the processes of watermark generation and audio synthesis occur simultaneously.
Groot exhibits exceptional robustness when facing compound attacks, maintaining an average watermark extraction accuracy of around 95%.
arXiv Detail & Related papers (2024-07-15T06:57:19Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.