Related papers: Generalizable Deepfake Detection via Effective Local-Global Feature Extraction

Related papers

Traffic Image Restoration under Adverse Weather via Frequency-Aware Mamba [37.901352525347214]
We propose Frequency-Aware Mamba (FAMamba), a novel framework that integrates frequency guidance with sequence modeling for efficient image restoration.<n>Our architecture consists of two key components: (1) a Dual-Branch Feature Extraction Block (DFEB) that enhances local-global interaction via bidirectional 2D frequency-adaptive scanning, and (2) a Prior-Guided Block (PGB) that refines texture details through wavelet-based high-frequency residual learning.
arXiv Detail & Related papers (2025-12-03T14:50:20Z)
Double Helix Diffusion for Cross-Domain Anomaly Image Generation [47.093354259479234]
This paper introduces Double Helix Diffusion (DH-Diff), a novel cross-domain generative framework designed to simultaneously synthesize high-fidelity anomaly images and their pixel-level annotation masks.<n>DH-Diff employs a unique architecture inspired by a double helix, cycling through distinct modules for feature separation, connection, and merging.<n>Extensive experiments demonstrate that DH-Diff significantly outperforms state-of-the-art methods in diversity and authenticity, leading to significant improvements in downstream anomaly detection performance.
arXiv Detail & Related papers (2025-09-16T08:06:07Z)
Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z)
NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection [14.7077339945096]
NS-Net is a novel framework that decouples semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images.<n>Experiments show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy.
arXiv Detail & Related papers (2025-08-02T07:58:15Z)
Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation [19.5984577708016]
We propose a multi-range representations-driven adversarial stego generation framework called MRAG for JPEG image hiding.<n>MRAG integrates the local-range characteristic of the convolution and the global-range modeling of the transformer.<n>It computes the adversarial loss between covers and stegos based on the surrogate steganalyzer's classified features.
arXiv Detail & Related papers (2025-07-11T06:45:07Z)
DSwinIR: Rethinking Window-based Attention for Image Restoration [109.38288333994407]
We propose the Deformable Sliding Window Transformer (DSwinIR) as a new foundational backbone architecture for image restoration.<n>At the heart of DSwinIR is the proposed novel Deformable Sliding Window (DSwin) Attention.<n>Extensive experiments show that DSwinIR sets a new state-of-the-art across a wide spectrum of image restoration tasks.
arXiv Detail & Related papers (2025-04-07T09:24:41Z)
D2Fusion: Dual-domain Fusion with Feature Superposition for Deepfake Detection [5.281969205292727]
Current Deepfake detection methods fail to thoroughly explore artifact information across different domains. We introduce a novel bi-directional attention module to capture the local positional information of artifact clues from the spatial domain. By doing so, we can obtain high-frequency information in the fine-grained features, which contains the global and subtle forgery information.
arXiv Detail & Related papers (2025-03-21T14:31:33Z)
Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff.<n>By integrating pseudo-target domain data with source domain data, we diversify the training dataset.<n> Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions. We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z)
DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model. Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z)
A Dual Domain Multi-exposure Image Fusion Network based on the Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures. We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI. Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
SuperGF: Unifying Local and Global Features for Visual Localization [13.869227429939423]
SuperGF is a transformer-based aggregation model that operates directly on image-matching-specific local features. We provide implementations of SuperGF using various types of local features, including dense and sparse learning-based or hand-crafted descriptors.
arXiv Detail & Related papers (2022-12-23T13:48:07Z)
GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z)
Cross-Domain Local Characteristic Enhanced Deepfake Video Detection [18.430287055542315]
Deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations. We propose a novel pipeline, Cross-Domain Local Forensics, for more general deepfake video detection.
arXiv Detail & Related papers (2022-11-07T07:44:09Z)
Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition. Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z)
Delving into Sequential Patches for Deepfake Detection [64.19468088546743]
Recent advances in face forgery techniques produce nearly untraceable deepfake videos, which could be leveraged with malicious intentions. Previous studies has identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods. We propose the Local- & Temporal-aware Transformer-based Deepfake Detection framework, which adopts a local-to-global learning protocol.
arXiv Detail & Related papers (2022-07-06T16:46:30Z)
Federated and Generalized Person Re-identification through Domain and Feature Hallucinating [88.77196261300699]
We study the problem of federated domain generalization (FedDG) for person re-identification (re-ID) We propose a novel method, called "Domain and Feature Hallucinating (DFH)", to produce diverse features for learning generalized local and global models. Our method achieves the state-of-the-art performance for FedDG on four large-scale re-ID benchmarks.
arXiv Detail & Related papers (2022-03-05T09:15:13Z)
An Entropy-guided Reinforced Partial Convolutional Network for Zero-Shot Learning [77.72330187258498]
We propose a novel Entropy-guided Reinforced Partial Convolutional Network (ERPCNet) ERPCNet extracts and aggregates localities based on semantic relevance and visual correlations without human-annotated regions. It not only discovers global-cooperative localities dynamically but also converges faster for policy gradient optimization.
arXiv Detail & Related papers (2021-11-03T11:13:13Z)
Local Relation Learning for Face Forgery Detection [73.73130683091154]
We propose a novel perspective of face forgery detection via local relation learning. Specifically, we propose a Multi-scale Patch Similarity Module (MPSM), which measures the similarity between features of local regions. We also propose an RGB-Frequency Attention Module (RFAM) to fuse information in both RGB and frequency domains for more comprehensive local feature representation.
arXiv Detail & Related papers (2021-05-06T10:44:32Z)
Video Salient Object Detection via Adaptive Local-Global Refinement [7.723369608197167]
Video salient object detection (VSOD) is an important task in many vision applications. We propose an adaptive local-global refinement framework for VSOD. We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
arXiv Detail & Related papers (2021-04-29T14:14:11Z)
Change Detection in Synthetic Aperture Radar Images Using a Dual-Domain Network [33.50775914682585]
Change detection from synthetic aperture radar (SAR) imagery is a critical yet challenging task. Existing methods mainly focus on feature extraction in spatial domain, and little attention has been paid to frequency domain. We propose a Dual-Domain Network to tackle the above two challenges.
arXiv Detail & Related papers (2021-04-14T08:41:48Z)
Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation [28.721376937882958]
Gait recognition is one of the most important biometric technologies and has been applied in many fields. Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans. We propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition.
arXiv Detail & Related papers (2020-11-03T04:07:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.