GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection
- URL: http://arxiv.org/abs/2211.08615v7
- Date: Mon, 4 Sep 2023 22:28:46 GMT
- Title: GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection
- Authors: Yan Ju, Shan Jia, Jialing Cai, Haiying Guan, Siwei Lyu
- Abstract summary: We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection.
GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
- Score: 29.118321046339656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of deep generative models (such as Generative
Adversarial Networks and Diffusion models), AI-synthesized images are now of
such high quality that humans can hardly distinguish them from pristine ones.
Although existing detection methods have shown high performance in specific
evaluation settings, e.g., on images from seen models or on images without
real-world post-processing, they tend to suffer serious performance degradation
in real-world scenarios where testing images can be generated by more powerful
generation models or combined with various post-processing operations. To
address this issue, we propose a Global and Local Feature Fusion (GLFF)
framework to learn rich and discriminative representations by combining
multi-scale global features from the whole image with refined local features
from informative patches for AI synthesized image detection. GLFF fuses
information from two branches: the global branch to extract multi-scale
semantic features and the local branch to select informative patches for
detailed local artifacts extraction. Due to the lack of a synthesized image
dataset simulating real-world applications for evaluation, we further create a
challenging fake image dataset, named DeepFakeFaceForensics (DF 3 ), which
contains 6 state-of-the-art generation models and a variety of post-processing
techniques to approach the real-world scenarios. Experimental results
demonstrate the superiority of our method to the state-of-the-art methods on
the proposed DF 3 dataset and three other open-source datasets.
Related papers
- GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross
Appearance-Edge Learning [49.93362169016503]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [63.54342601757723]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image
Fusion [59.19469551774703]
Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks.
We propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts.
Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate.
arXiv Detail & Related papers (2023-02-02T20:06:58Z) - Rethinking Blur Synthesis for Deep Real-World Image Deblurring [4.00114307523959]
We propose a novel realistic blur synthesis pipeline to simulate the camera imaging process.
We develop an effective deblurring model that captures non-local dependencies and local context in the feature domain simultaneously.
A comprehensive experiment on three real-world datasets shows that the proposed deblurring model performs better than state-of-the-art methods.
arXiv Detail & Related papers (2022-09-28T06:50:16Z) - Fusing Global and Local Features for Generalized AI-Synthesized Image
Detection [31.35052580048599]
We design a two-branch model to combine global spatial information from the whole image and local informative features from patches selected by a novel patch selection module.
We collect a highly diverse dataset synthesized by 19 models with various objects and resolutions to evaluate our model.
arXiv Detail & Related papers (2022-03-26T01:55:37Z) - Reconciliation of Statistical and Spatial Sparsity For Robust Image and
Image-Set Classification [27.319334479994787]
We propose a novel Joint Statistical and Spatial Sparse representation, dubbed textitJ3S, to model the image or image-set data for classification.
We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity.
Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
arXiv Detail & Related papers (2021-06-01T06:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.