Related papers: FreqCross: A Multi-Modal Frequency-Spatial Fusion Network for Robust Detection of Stable Diffusion 3.5 Generated Images

FreqCross: A Multi-Modal Frequency-Spatial Fusion Network for Robust Detection of Stable Diffusion 3.5 Generated Images

URL: http://arxiv.org/abs/2507.02995v2
Date: Tue, 08 Jul 2025 03:56:18 GMT
Title: FreqCross: A Multi-Modal Frequency-Spatial Fusion Network for Robust Detection of Stable Diffusion 3.5 Generated Images
Authors: Guang Yang,
Abstract summary: FreqCross is a novel multi-modal fusion network that combines spatial RGB features, frequency domain artifacts, and radial energy distribution patterns.<n>Experiments on a dataset of 10,000 paired real (MS-COCO) and synthetic (Stable Diffusion 3.5) images demonstrate that FreqCross achieves 97.8% accuracy.
Score: 4.524282351757178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of diffusion models, particularly Stable Diffusion 3.5, has enabled the generation of highly photorealistic synthetic images that pose significant challenges to existing detection methods. This paper presents FreqCross, a novel multi-modal fusion network that combines spatial RGB features, frequency domain artifacts, and radial energy distribution patterns to achieve robust detection of AI-generated images. Our approach leverages a three-branch architecture: (1) a ResNet-18 backbone for spatial feature extraction, (2) a lightweight CNN for processing 2D FFT magnitude spectra, and (3) a multi-layer perceptron for analyzing radial energy profiles. We introduce a novel radial energy distribution analysis that captures characteristic frequency artifacts inherent in diffusion-generated images, and fuse it with spatial and spectral cues via simple feature concatenation followed by a compact classification head. Extensive experiments on a dataset of 10,000 paired real (MS-COCO) and synthetic (Stable Diffusion 3.5) images demonstrate that FreqCross achieves 97.8\% accuracy, outperforming state-of-the-art baselines by 5.2\%. The frequency analysis further reveals that synthetic images exhibit distinct spectral signatures in the 0.1--0.4 normalised frequency range, providing theoretical foundation for our approach. Code and pre-trained models are publicly available to facilitate reproducible research.

Related papers

Template-Fitting Meets Deep Learning: Redshift Estimation Using Physics-Guided Neural Networks [0.4416697929169138]
We present a hybrid method that integrates template fitting with deep learning using physics-guided neural networks.<n>We evaluate our model on the publicly available PREML dataset, which includes approximately 400,000 galaxies.<n>Our approach achieves an RMS error of 0.0507, a 3-sigma catastrophic rate of 0.13%, and a bias of 0.0028.
arXiv Detail & Related papers (2025-07-01T15:29:45Z)
Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting [28.147938573798367]
We present an innovative frequency-embedded 3D Gaussian splatting (3DGS) algorithm for wideband radio-frequency (RF) radiance field modeling.<n>We propose a large-scale power angular spectrum (PAS) dataset containing 50000 samples ranging from 1 to 100 GHz in 6 indoor environments.<n>Our approach achieves an average Structural Similarity Index Measure (SSIM) up to 0.72, and a significant improvement up to 17.8% compared to the current state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2025-05-27T04:48:26Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
FCDM: A Physics-Guided Bidirectional Frequency Aware Convolution and Diffusion-Based Model for Sinogram Inpainting [14.043383277622874]
We propose FCDM, a physics-guided, frequency-aware sinogram inpainting framework.<n>It integrates bidirectional frequency-domain convolutions to disentangle overlapping features while enforcing total absorption and frequency-domain consistency via a physics-informed loss.<n>Experiments on synthetic and real-world datasets show that FCDM outperforms existing methods, achieving SSIM over 0.95 and PSNR above 30 dB, with up to 33% and 29% improvements over baselines.
arXiv Detail & Related papers (2024-08-26T12:31:38Z)
Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft [2.5057561650768814]
The FINCH mission aims to monitor crop residue cover in agricultural fields. Hyperspectral imaging captures both spectral and spatial information. It is prone to various types of noise, including random noise, stripe noise, and dead pixels.
arXiv Detail & Related papers (2024-06-15T19:34:18Z)
Diffusion Facial Forgery Detection [56.69763252655695]
This paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%.
arXiv Detail & Related papers (2024-01-29T03:20:19Z)
Spec-NeRF: Multi-spectral Neural Radiance Fields [9.242830798112855]
We propose Multi-spectral Neural Radiance Fields(Spec-NeRF) for jointly reconstructing a multispectral radiance field and spectral sensitivity functions(SSFs) of the camera from a set of color images filtered by different filters. Our experiments on both synthetic and real scenario datasets demonstrate that utilizing filtered RGB images with learnable NeRF and SSFs can achieve high fidelity and promising spectral reconstruction.
arXiv Detail & Related papers (2023-09-14T16:17:55Z)
WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields [149.2296890464997]
We design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF. WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.
arXiv Detail & Related papers (2023-08-09T09:24:56Z)
SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation [7.907504142396784]
This study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. SAR-NeRF is constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels. It is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup.
arXiv Detail & Related papers (2023-07-11T07:37:56Z)
Denoising Diffusion Models for Plug-and-Play Image Restoration [135.6359475784627]
This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework. Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models.
arXiv Detail & Related papers (2023-05-15T20:24:38Z)
Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models [54.952979335638204]
We propose a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data. Our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.
arXiv Detail & Related papers (2023-01-19T13:37:19Z)
Hyperspectral Image Super-resolution via Deep Progressive Zero-centric Residual Learning [62.52242684874278]
Cross-modality distribution of spatial and spectral information makes the problem challenging. We propose a novel textitlightweight deep neural network-based framework, namely PZRes-Net. Our framework learns a high resolution and textitzero-centric residual image, which contains high-frequency spatial details of the scene.
arXiv Detail & Related papers (2020-06-18T06:32:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.