END$^2$: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions
- URL: http://arxiv.org/abs/2412.09960v1
- Date: Fri, 13 Dec 2024 08:37:30 GMT
- Title: END$^2$: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions
- Authors: Nan Sun, Han Fang, Yuxing Lu, Chengxin Zhao, Hefei Ling,
- Abstract summary: Real-world distortions are often non-differentiable, leading to challenges in end-to-end training.
We propose a novel dual-decoder architecture (END$2$) to better incorporate non-differentiable distortions into training.
Our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions.
- Score: 15.774214187916423
- License:
- Abstract: DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END$^2$). Unlike conventional END architecture, our method employs two structurally identical decoders: the Teacher Decoder, processing pure watermarked images, and the Student Decoder, handling distortion-perturbed images. The gradient is backpropagated only through the Teacher Decoder branch to optimize the encoder thus bypassing the problem of non-differentiability. To ensure resistance to arbitrary distortions, we enforce alignment of the two decoders' feature representations by maximizing the cosine similarity between their intermediate vectors on a hypersphere. Extensive experiments demonstrate that our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions. Moreover, even without the differentiability constraint, our method surpasses baselines with a differentiable noise layer. Our approach is effective and easily implementable across all END architectures, enhancing practicality and generalizability.
Related papers
- How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval? [99.87554379608224]
Cross-modal similarity score distribution of cross-encoder is more concentrated while the result of dual-encoder is nearly normal.
Only the relative order between hard negatives conveys valid knowledge while the order information between easy negatives has little significance.
We propose a novel Contrastive Partial Ranking Distillation (DCPR) method which implements the objective of mimicking relative order between hard negative samples with contrastive learning.
arXiv Detail & Related papers (2024-07-10T09:10:01Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - Learning Linear Block Error Correction Codes [62.25533750469467]
We propose for the first time a unified encoder-decoder training of binary linear block codes.
We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient.
arXiv Detail & Related papers (2024-05-07T06:47:12Z) - An Effective Mixture-Of-Experts Approach For Code-Switching Speech
Recognition Leveraging Encoder Disentanglement [9.28943772676672]
Codeswitching phenomenon remains a major obstacle that hinders automatic speech recognition.
We introduce a novel disentanglement loss to enable the lower-layer of the encoder to capture inter-lingual acoustic information.
We verify that our proposed method outperforms the prior-art methods using pretrained dual-encoders.
arXiv Detail & Related papers (2024-02-27T04:08:59Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - Asymmetric Learned Image Compression with Multi-Scale Residual Block,
Importance Map, and Post-Quantization Filtering [15.056672221375104]
Deep learning-based image compression has achieved better ratedistortion (R-D) performance than the latest traditional method, H.266/VVC.
Many leading learned schemes cannot maintain a good trade-off between performance and complexity.
We propose an effcient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art.
arXiv Detail & Related papers (2022-06-21T09:34:29Z) - Optimally Controllable Perceptual Lossy Compression [16.208548355509127]
Recent studies in lossy compression show that distortion and perceptual quality are at odds with each other.
To attain different perceptual quality, different decoders have to be trained.
We present a nontrivial finding that only two decoders are sufficient for optimally achieving arbitrary D-P tradeoffs.
arXiv Detail & Related papers (2022-06-21T02:48:35Z) - LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
Retrieval [117.15862403330121]
We propose LoopITR, which combines dual encoders and cross encoders in the same network for joint learning.
Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder.
arXiv Detail & Related papers (2022-03-10T16:41:12Z) - Universal Rate-Distortion-Perception Representations for Lossy
Compression [31.28856752892628]
We consider the notion of universal representations in which one may fix an encoder and vary the decoder to achieve any point within a collection of distortion and perception constraints.
We prove that the corresponding information-theoretic universal rate-distortion-perception is operationally achievable in an approximate sense.
arXiv Detail & Related papers (2021-06-18T18:52:08Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Layer-Wise Multi-View Learning for Neural Machine Translation [45.679212203943194]
Traditional neural machine translation is limited to the topmost encoder layer's context representation.
We propose layer-wise multi-view learning to solve this problem.
Our approach yields stable improvements over multiple strong baselines.
arXiv Detail & Related papers (2020-11-03T05:06:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.