Effective Message Hiding with Order-Preserving Mechanisms
- URL: http://arxiv.org/abs/2402.19160v5
- Date: Fri, 15 Aug 2025 05:32:34 GMT
- Title: Effective Message Hiding with Order-Preserving Mechanisms
- Authors: Gao Yu, Qiu Xuchong, Ye Zihan,
- Abstract summary: StegaFormer is a framework designed to preserve bit order and enable global fusion between modalities.<n>StegaFormer surpasses existing state-of-the-art methods in terms of recovery accuracy, message capacity, and imperceptibility.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Message hiding, a technique that conceals secret message bits within a cover image, aims to achieve an optimal balance among message capacity, recovery accuracy, and imperceptibility. While convolutional neural networks have notably improved message capacity and imperceptibility, achieving high recovery accuracy remains challenging. This challenge arises because convolutional operations struggle to preserve the sequential order of message bits and effectively address the discrepancy between these two modalities. To address this, we propose StegaFormer, an innovative MLP-based framework designed to preserve bit order and enable global fusion between modalities. Specifically, StegaFormer incorporates three crucial components: Order-Preserving Message Encoder (OPME), Decoder (OPMD) and Global Message-Image Fusion (GMIF). OPME and OPMD aim to preserve the order of message bits by segmenting the entire sequence into equal-length segments and incorporating sequential information during encoding and decoding. Meanwhile, GMIF employs a cross-modality fusion mechanism to effectively fuse the features from the two uncorrelated modalities. Experimental results on the COCO and DIV2K datasets demonstrate that StegaFormer surpasses existing state-of-the-art methods in terms of recovery accuracy, message capacity, and imperceptibility. We will make our code publicly available.
Related papers
- Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion [0.0]
Multimodal brain decoding aims to reconstruct semantic information consistent with visual stimuli from brain activity signals such as fMRI.<n>We propose a BrainROI model and achieve leading-level results in brain-captioning evaluation on the NSD dataset.<n>Under the cross-subject setting, compared with recent state-of-the-art methods and representative baselines, metrics such as BLEU-4 and CIDEr show clear improvements.
arXiv Detail & Related papers (2025-12-23T11:04:34Z) - MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation [20.14002849273559]
Unified multimodal models aim to integrate understanding and generation within a single framework.<n>We present MammothModa2 (Mammoth2), a unified autoregressive-diffusion (AR-Diffusion) framework.<n>Mammoth2 delivers strong text-to-image and instruction-based editing performance on public benchmarks.
arXiv Detail & Related papers (2025-11-23T03:25:39Z) - A Content-Preserving Secure Linguistic Steganography [21.247775412166117]
We propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text.<n>We introduce CLstega, a novel method that embeds secret messages through controllable distribution transformation.<n> Experimental results show that CLstega can achieve a 100% extraction success rate, and outperforms existing methods in security.
arXiv Detail & Related papers (2025-11-16T11:50:13Z) - Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition [94.56755080185732]
We propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information.
Our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation.
arXiv Detail & Related papers (2024-05-31T08:06:05Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Computation and Parameter Efficient Multi-Modal Fusion Transformer for
Cued Speech Recognition [48.84506301960988]
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people.
automatic CS recognition (ACSR) seeks to transcribe visual cues of speech into text.
arXiv Detail & Related papers (2024-01-31T05:20:29Z) - Generative AI-aided Joint Training-free Secure Semantic Communications
via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding.
In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - SCMM: Calibrating Cross-modal Representations for Text-Based Person Search [45.24784242117999]
Text-Based Person Search (TBPS) faces critical challenges in cross-modal information fusion.<n>We propose SCMM (Sew and Masked Modeling), a novel framework addressing these fusion challenges through two complementary mechanisms.
arXiv Detail & Related papers (2023-04-05T07:50:16Z) - GMF: General Multimodal Fusion Framework for Correspondence Outlier
Rejection [36.35090386001373]
We propose General Multimodal Fusion to learn to reject the correspondence outliers.
Our GMF achieves wide generalization ability and consistently improves the point cloud registration accuracy.
arXiv Detail & Related papers (2022-11-01T01:18:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.