Related papers: Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach

Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach

URL: http://arxiv.org/abs/2602.10079v1
Date: Tue, 10 Feb 2026 18:46:04 GMT
Title: Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach
Authors: Soumyaroop Nandi, Prem Natarajan,
Abstract summary: Forensim is an attention-based state-space framework for image forgery detection.<n>It jointly localizes both manipulated (target) and source regions.<n>Forensim achieves state-of-the-art performance on standard benchmarks.
Score: 8.024142807011378
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Forensim, an attention-based state-space framework for image forgery detection that jointly localizes both manipulated (target) and source regions. Unlike traditional approaches that rely solely on artifact cues to detect spliced or forged areas, Forensim is designed to capture duplication patterns crucial for understanding context. In scenarios such as protest imagery, detecting only the forged region, for example a duplicated act of violence inserted into a peaceful crowd, can mislead interpretation, highlighting the need for joint source-target localization. Forensim outputs three-class masks (pristine, source, target) and supports detection of both splicing and copy-move forgeries within a unified architecture. We propose a visual state-space model that leverages normalized attention maps to identify internal similarities, paired with a region-based block attention module to distinguish manipulated regions. This design enables end-to-end training and precise localization. Forensim achieves state-of-the-art performance on standard benchmarks. We also release CMFD-Anything, a new dataset addressing limitations of existing copy-move forgery datasets.

Related papers

Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement [52.15627062770557]
Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods.<n>Recent approaches in image manipulation detection have largely been driven by fully supervised approaches.<n>We present a novel weakly supervised framework based on a dual-branch Transformer-CNN architecture.
arXiv Detail & Related papers (2025-03-26T07:35:09Z)
Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining [25.174869954072648]
This paper proposes an Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining (IMNet) To obtain complete object-level targets, we customize prototypes for both the source and tampered regions and dynamically update them. We operate experiments on three public datasets which validate the effectiveness and the robustness of the proposed IMNet.
arXiv Detail & Related papers (2024-03-31T09:01:17Z)
Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation [37.15828464616587]
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation. We propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS)
arXiv Detail & Related papers (2024-01-18T10:55:13Z)
CLIM: Contrastive Language-Image Mosaic for Region Representation [58.05870131126816]
Contrastive Language-Image Mosaic (CLIM) is a novel approach for aligning region and text representations. CLIM consistently improves different open-vocabulary object detection methods. It can effectively enhance the region representation of vision-language models.
arXiv Detail & Related papers (2023-12-18T17:39:47Z)
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion. We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z)
RegionCLIP: Region-based Language-Image Pretraining [94.29924084715316]
Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification. We propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations. Our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets.
arXiv Detail & Related papers (2021-12-16T18:39:36Z)
Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems. We present three novel scenarios for localization and mapping which require the continuous update of feature representations. Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z)
Character Region Attention For Text Spotting [18.713194210876136]
A scene text spotter is composed of text detection and recognition modules. A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature. This is possible since the two modules share a common sub-task which is to find the location of the character regions. This architecture is formed by utilizing detection outputs in the recognizer and propagating the recognition loss through the detection stage.
arXiv Detail & Related papers (2020-07-19T09:12:23Z)
Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z)
Copy Move Source-Target Disambiguation through Multi-Branch CNNs [38.75957215447834]
We propose a method to identify the source and target regions of a copy-move forgery so allow a correct localisation of the tampered area. First, we cast the problem into a hypothesis testing framework whose goal is to decide which region between the two nearly-duplicate regions detected by a generic copy-move detector is the original one. Then we design a multi-branch CNN architecture that solves the hypothesis testing problem by learning a set of features capable to reveal the presence of artefacts and boundary inconsistencies in the copy-moved area.
arXiv Detail & Related papers (2019-12-29T11:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.