Related papers: Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation

Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation

URL: http://arxiv.org/abs/2507.08343v2
Date: Tue, 05 Aug 2025 07:39:11 GMT
Title: Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation
Authors: Junxue Yang, Xin Liao, Weixuan Tang, Jianhua Yang, Zheng Qin,
Abstract summary: We propose a multi-range representations-driven adversarial stego generation framework called MRAG for JPEG image hiding.<n>MRAG integrates the local-range characteristic of the convolution and the global-range modeling of the transformer.<n>It computes the adversarial loss between covers and stegos based on the surrogate steganalyzer's classified features.
Score: 19.5984577708016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image hiding fully explores the hidden potential of deep learning-based models, aiming to conceal image-level messages within cover images and reveal them from stego images to achieve covert communication. Existing hiding schemes are easily detected by the naked eyes or steganalyzers due to the cover type confined to the spatial domain, single-range feature extraction and attacks, and insufficient loss constraints. To address these issues, we propose a multi-range representations-driven adversarial stego generation framework called MRAG for JPEG image hiding. This design stems from the fact that steganalyzers typically combine local-range and global-range information to better capture hidden traces. Specifically, MRAG integrates the local-range characteristic of the convolution and the global-range modeling of the transformer. Meanwhile, a features angle-norm disentanglement loss is designed to launch multi-range representations-driven feature-level adversarial attacks. It computes the adversarial loss between covers and stegos based on the surrogate steganalyzer's classified features, i.e., the features before the last fully connected layer. Under the dual constraints of features angle and norm, MRAG can delicately encode the concatenation of cover and secret into subtle adversarial perturbations from local and global ranges relevant to steganalysis. Therefore, the resulting stego can achieve visual and steganalysis imperceptibility. Moreover, coarse-grained and fine-grained frequency decomposition operations are devised to transform the input, introducing multi-grained information. Extensive experiments demonstrate that MRAG can achieve state-of-the-art performance.

Related papers

NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection [14.7077339945096]
NS-Net is a novel framework that decouples semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images.<n>Experiments show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy.
arXiv Detail & Related papers (2025-08-02T07:58:15Z)
LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection [11.700935740718675]
LATTE - Latent Trajectory Embedding - is a novel approach that models the evolution of latent embeddings across several denoising timesteps.<n>By modeling the trajectory of such embeddings rather than single-step errors, LATTE captures subtle, discriminative patterns that distinguish real from generated images.
arXiv Detail & Related papers (2025-07-03T12:53:47Z)
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach [69.01456182499486]
textbfBR-Gen is a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations.<n>textbfNFA-ViT is a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries.
arXiv Detail & Related papers (2025-04-16T09:57:23Z)
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.<n>Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.<n>We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z)
HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping [29.678756772610797]
Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP)
arXiv Detail & Related papers (2023-12-29T06:46:37Z)
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection [86.97062579515833]
We introduce the concept of Neighboring Pixel Relationships(NPR) as a means to capture and characterize the generalized structural artifacts stemming from up-sampling operations. A comprehensive analysis is conducted on an open-world dataset, comprising samples generated by tft28 distinct generative models. This analysis culminates in the establishment of a novel state-of-the-art performance, showcasing a remarkable tft11.6% improvement over existing methods.
arXiv Detail & Related papers (2023-12-16T14:27:06Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Latent Space Energy-based Model for Fine-grained Open Set Recognition [46.0388856095674]
Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks. In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world.
arXiv Detail & Related papers (2023-09-19T16:00:09Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
Towards Robust GAN-generated Image Detection: a Multi-view Completion Representation [27.483031588071942]
GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. We propose a robust detection framework based on a novel multi-view image completion representation. We evaluate the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks.
arXiv Detail & Related papers (2023-06-02T08:38:02Z)
Hierarchical Forgery Classifier On Multi-modality Face Forgery Clues [61.37306431455152]
We propose a novel Hierarchical Forgery for Multi-modality Face Forgery Detection (HFC-MFFD) The HFC-MFFD learns robust patches-based hybrid representation to enhance forgery authentication in multiple-modality scenarios. The specific hierarchical face forgery is proposed to alleviate the class imbalance problem and further boost detection performance.
arXiv Detail & Related papers (2022-12-30T10:54:29Z)
Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth Uncertainty Learning [54.15303628138665]
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks. Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance. We propose Dual Spoof Disentanglement Generation framework to tackle this challenge by "anti-spoofing via generation"
arXiv Detail & Related papers (2021-12-01T15:36:59Z)
MC-LCR: Multi-modal contrastive classification by locally correlated representations for effective face forgery detection [11.124150983521158]
We propose a novel framework named Multi-modal Contrastive Classification by Locally Correlated Representations. Our MC-LCR aims to amplify implicit local discrepancies between authentic and forged faces from both spatial and frequency domains. We achieve state-of-the-art performance and demonstrate the robustness and generalization of our method.
arXiv Detail & Related papers (2021-10-07T09:24:12Z)
Label Geometry Aware Discriminator for Conditional Generative Networks [40.89719383597279]
Conditional Generative Adversarial Networks (GANs) can generate highly photo realistic images with desired target classes. These synthetic images have not always been helpful to improve downstream supervised tasks such as image classification.
arXiv Detail & Related papers (2021-05-12T08:17:25Z)
DWDN: Deep Wiener Deconvolution Network for Non-Blind Image Deblurring [66.91879314310842]
We propose an explicit deconvolution process in a feature space by integrating a classical Wiener deconvolution framework with learned deep features. A multi-scale cascaded feature refinement module then predicts the deblurred image from the deconvolved deep features. We show that the proposed deep Wiener deconvolution network facilitates deblurred results with visibly fewer artifacts and quantitatively outperforms state-of-the-art non-blind image deblurring methods by a wide margin.
arXiv Detail & Related papers (2021-03-18T00:38:11Z)
Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications. The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z)
Progressively Unfreezing Perceptual GAN [28.330940021951438]
Generative adversarial networks (GANs) are widely used in image generation tasks, yet the generated images are usually lack of texture details. We propose a general framework, called Progressively Unfreezing Perceptual GAN (PUPGAN), which can generate images with fine texture details.
arXiv Detail & Related papers (2020-06-18T03:12:41Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.