Related papers: MT-Mark: Rethinking Image Watermarking via Mutual-Teacher Collaboration with Adaptive Feature Modulation

MT-Mark: Rethinking Image Watermarking via Mutual-Teacher Collaboration with Adaptive Feature Modulation

URL: http://arxiv.org/abs/2512.19438v1
Date: Mon, 22 Dec 2025 14:36:08 GMT
Title: MT-Mark: Rethinking Image Watermarking via Mutual-Teacher Collaboration with Adaptive Feature Modulation
Authors: Fei Ge, Ying Huang, Jie Liu, Guixuan Zhang, Zhi Zeng, Shuwu Zhang, Hu Guan,
Abstract summary: Existing deep image watermarking methods follow a fixed embedding-distortion-extraction pipeline.<n>We introduce a Collaborative Interaction Mechanism (CIM) that establishes direct, bidirectional communication between the embedder and extractor.<n>We also propose an Adaptive Feature Modulation Module (AFMM) to support effective interaction.
Score: 11.139536166110204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing deep image watermarking methods follow a fixed embedding-distortion-extraction pipeline, where the embedder and extractor are weakly coupled through a final loss and optimized in isolation. This design lacks explicit collaboration, leaving no structured mechanism for the embedder to incorporate decoding-aware cues or for the extractor to guide embedding during training. To address this architectural limitation, we rethink deep image watermarking by reformulating embedding and extraction as explicitly collaborative components. To realize this reformulation, we introduce a Collaborative Interaction Mechanism (CIM) that establishes direct, bidirectional communication between the embedder and extractor, enabling a mutual-teacher training paradigm and coordinated optimization. Built upon this explicitly collaborative architecture, we further propose an Adaptive Feature Modulation Module (AFMM) to support effective interaction. AFMM enables content-aware feature regulation by decoupling modulation structure and strength, guiding watermark embedding toward stable image features while suppressing host interference during extraction. Under CIM, the AFMMs on both sides form a closed-loop collaboration that aligns embedding behavior with extraction objectives. This architecture-level redesign changes how robustness is learned in watermarking systems. Rather than relying on exhaustive distortion simulation, robustness emerges from coordinated representation learning between embedding and extraction. Experiments on real-world and AI-generated datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in watermark extraction accuracy while maintaining high perceptual quality, showing strong robustness and generalization.

Related papers

StructAlign: Structured Cross-Modal Alignment for Continual Text-to-Video Retrieval [75.28673512571449]
A key challenge in Continual Text-to-Video Retrieval is feature drift.<n>We propose StructAlign, a structured cross-modal alignment method for CTVR.<n>Our method consistently outperforms state-of-the-art continual retrieval approaches.
arXiv Detail & Related papers (2026-01-28T13:34:44Z)
CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion [51.060328159429154]
Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities.<n>We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts.<n> Experiments demonstrate state-of-the-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.
arXiv Detail & Related papers (2026-01-12T13:36:48Z)
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation [104.59740403500132]
Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance.<n>We propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC)<n>Our approach hierarchically bridges representation gaps between complete and incomplete modalities across input, feature and output levels.
arXiv Detail & Related papers (2025-09-19T17:29:25Z)
Masked Feature Modeling Enhances Adaptive Segmentation [9.279607578922683]
Masked Feature Modeling (MFM) is a novel auxiliary task that performs feature masking and reconstruction directly in the feature space.<n>MFM aligns its learning target with the main segmentation task, ensuring compatibility with standard architectures like DeepLab and DAFormer.<n>To facilitate effective reconstruction, we introduce a lightweight auxiliary module, Rebuilder, which is trained jointly but discarded during inference.
arXiv Detail & Related papers (2025-09-17T08:16:05Z)
Hybrid-supervised Hypergraph-enhanced Transformer for Micro-gesture Based Emotion Recognition [30.016692048849226]
Micro-gestures are unconsciously performed body gestures that can convey the emotion states of humans.<n>We propose to recognize the emotion states based on the micro-gestures by reconstructing the behavior patterns with a hypergraph-enhanced Transformer.<n>The proposed method is evaluated on two publicly available datasets, namely iMiGUE and SMG.
arXiv Detail & Related papers (2025-07-20T08:27:56Z)
Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z)
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning [68.00975867932331]
Existing watermark removal methods mainly rely on UNet with task-specific decoder branches. We introduce an implicit joint learning paradigm to holistically integrate information from both branches. The results demonstrate our approach's remarkable superiority, surpassing existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-08-20T07:56:34Z)
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation [50.556961575275345]
We propose a perception-aware fusion framework to promote segmentation robustness in adversarial scenes. We show that our scheme substantially enhances the robustness, with gains of 15.3% mIOU, compared with advanced competitors.
arXiv Detail & Related papers (2023-08-08T01:55:44Z)
A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
Self-supervised Correlation Mining Network for Person Image Generation [9.505343361614928]
Person image generation aims to perform non-rigid deformation on source images. We propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space. For improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss.
arXiv Detail & Related papers (2021-11-26T03:57:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.