M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
- URL: http://arxiv.org/abs/2506.20922v1
- Date: Thu, 26 Jun 2025 01:06:57 GMT
- Title: M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
- Authors: Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee,
- Abstract summary: Deep learning methods have recently achieved high accuracy in pixel-level forgery localization.<n>We propose M2SFormer, a novel Transformer encoder-based framework designed to overcome these challenges.<n>M2SFormer unifies multi-frequency and multi-scale attentions in the skip connection, harnessing global context to better capture forgery artifacts.
- Score: 0.8090496457850851
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image editing techniques have rapidly advanced, facilitating both innovative use cases and malicious manipulation of digital images. Deep learning-based methods have recently achieved high accuracy in pixel-level forgery localization, yet they frequently struggle with computational overhead and limited representation power, particularly for subtle or complex tampering. In this paper, we propose M2SFormer, a novel Transformer encoder-based framework designed to overcome these challenges. Unlike approaches that process spatial and frequency cues separately, M2SFormer unifies multi-frequency and multi-scale attentions in the skip connection, harnessing global context to better capture diverse forgery artifacts. Additionally, our framework addresses the loss of fine detail during upsampling by utilizing a global prior map, a curvature metric indicating the difficulty of forgery localization, which then guides a difficulty-guided attention module to preserve subtle manipulations more effectively. Extensive experiments on multiple benchmark datasets demonstrate that M2SFormer outperforms existing state-of-the-art models, offering superior generalization in detecting and localizing forgeries across unseen domains.
Related papers
- Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement [52.15627062770557]
Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods.<n>Recent approaches in image manipulation detection have largely been driven by fully supervised approaches.<n>We present a novel weakly supervised framework based on a dual-branch Transformer-CNN architecture.
arXiv Detail & Related papers (2025-03-26T07:35:09Z) - A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization [22.725542948364357]
We argue that the basic binary forgery mask is inadequate for explaining model predictions.<n>In this study, we generate salient region-focused interpretation for the forgery images.<n>We develop ForgeryTalker, an architecture designed for concurrent forgery localization and interpretation.
arXiv Detail & Related papers (2024-12-27T15:23:39Z) - Image Forgery Localization via Guided Noise and Multi-Scale Feature Aggregation [13.610095493539397]
We propose a guided and multi-scale feature aggregated network for IFL.<n>In order to learn the noise feature under different types of forgery, we develop an effective noise extraction module.<n>Then, we design a Feature Aggregation Module (FAM) that uses dynamic convolution to adaptively aggregate RGB and noise features over multiple scales.<n>Finally, we propose an Atrous Residual Pyramid Module (ARPM) to enhance features representation and capture both global and local features.
arXiv Detail & Related papers (2024-11-17T11:50:09Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.<n>Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.<n>We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Skeleton-Guided Instance Separation for Fine-Grained Segmentation in
Microscopy [23.848474219551818]
One of the fundamental challenges in microscopy (MS) image analysis is instance segmentation (IS)
We propose a novel one-stage framework named A2B-IS to address this challenge and enhance the accuracy of IS in MS images.
Our method has been thoroughly validated on two large-scale MS datasets.
arXiv Detail & Related papers (2024-01-18T11:14:32Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - MC-LCR: Multi-modal contrastive classification by locally correlated
representations for effective face forgery detection [11.124150983521158]
We propose a novel framework named Multi-modal Contrastive Classification by Locally Correlated Representations.
Our MC-LCR aims to amplify implicit local discrepancies between authentic and forged faces from both spatial and frequency domains.
We achieve state-of-the-art performance and demonstrate the robustness and generalization of our method.
arXiv Detail & Related papers (2021-10-07T09:24:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.