Propose and Rectify: A Forensics-Driven MLLM Framework for Image Manipulation Localization
- URL: http://arxiv.org/abs/2508.17976v1
- Date: Mon, 25 Aug 2025 12:43:53 GMT
- Title: Propose and Rectify: A Forensics-Driven MLLM Framework for Image Manipulation Localization
- Authors: Keyang Zhang, Chenqi Kong, Hui Liu, Bo Ding, Xinghao Jiang, Haoliang Li,
- Abstract summary: This paper presents a novel Propose-Rectify framework that bridges semantic reasoning with forensic-specific analysis.<n>Our framework ensures that initial semantic proposals are systematically validated and enhanced through concrete technical evidence, resulting in comprehensive detection accuracy and localization precision.
- Score: 49.71303998618939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing sophistication of image manipulation techniques demands robust forensic solutions that can both reliably detect alterations and precisely localize tampered regions. Recent Multimodal Large Language Models (MLLMs) show promise by leveraging world knowledge and semantic understanding for context-aware detection, yet they struggle with perceiving subtle, low-level forensic artifacts crucial for accurate manipulation localization. This paper presents a novel Propose-Rectify framework that effectively bridges semantic reasoning with forensic-specific analysis. In the proposal stage, our approach utilizes a forensic-adapted LLaVA model to generate initial manipulation analysis and preliminary localization of suspicious regions based on semantic understanding and contextual reasoning. In the rectification stage, we introduce a Forensics Rectification Module that systematically validates and refines these initial proposals through multi-scale forensic feature analysis, integrating technical evidence from several specialized filters. Additionally, we present an Enhanced Segmentation Module that incorporates critical forensic cues into SAM's encoded image embeddings, thereby overcoming inherent semantic biases to achieve precise delineation of manipulated regions. By synergistically combining advanced multimodal reasoning with established forensic methodologies, our framework ensures that initial semantic proposals are systematically validated and enhanced through concrete technical evidence, resulting in comprehensive detection accuracy and localization precision. Extensive experimental validation demonstrates state-of-the-art performance across diverse datasets with exceptional robustness and generalization capabilities.
Related papers
- REVEAL: Reasoning-enhanced Forensic Evidence Analysis for Explainable AI-generated Image Detection [30.963994372913092]
We introduce textbfREVEAL-Bench, the first reasoning-enhanced multimodal benchmark for AI-generated image detection.<n>Our framework integrates detection with a novel expert-grounded reinforcement learning.<n> REVEAL significantly enhances detection accuracy, explanation fidelity, and robust cross-model generalization.
arXiv Detail & Related papers (2025-11-28T13:11:08Z) - From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection [19.240335260177382]
We introduce AIFo (Agent-based Image Forensics), a training-free framework that emulates human forensic investigation through multi-agent collaboration.<n>Unlike conventional methods, our framework employs a set of forensic tools, including reverse image search, metadata extraction, pre-trained classifiers, and VLM analysis.<n>Our comprehensive evaluation spans 6,000 images and challenges real-world scenarios, including images from modern generative platforms and diverse online sources.
arXiv Detail & Related papers (2025-10-31T18:36:49Z) - Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations [56.816929931908824]
We pioneer the detection of semantically-coordinated manipulations in multimodal data.<n>We propose a Retrieval-Augmented Manipulation Detection and Grounding (RamDG) framework.<n>Our framework significantly outperforms existing methods, achieving 2.06% higher detection accuracy on SAMM compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-16T04:18:48Z) - AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization [43.86757207244911]
We propose a comprehensive framework addressing limitations through two synergistic innovations.<n>First, we introduce a multi-stage deliberative reasoning process that guides models from region identification to focused examination.<n>Second, we develop a fine-grained reward mechanism incorporating classification accuracy and localization supervision.
arXiv Detail & Related papers (2025-08-06T08:00:27Z) - Chances and Challenges of the Model Context Protocol in Digital Forensics and Incident Response [0.0]
Large language models hold considerable promise for supporting forensic investigations, but their widespread adoption is hindered by a lack of transparency.<n>This paper explores how the emerging Model Context Protocol can address these challenges and support the meaningful use of LLMs in digital forensics.
arXiv Detail & Related papers (2025-05-30T22:15:48Z) - FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics.<n>FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights.<n>FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z) - ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection [107.86009509291581]
We propose ForgerySleuth to perform comprehensive clue fusion and generate segmentation outputs indicating regions that are tampered with.<n>Our experiments demonstrate the effectiveness of ForgeryAnalysis and show that ForgerySleuth significantly outperforms existing methods in robustness, generalization, and explainability.
arXiv Detail & Related papers (2024-11-29T04:35:18Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently.
By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z) - Cross-target Stance Detection by Exploiting Target Analytical
Perspectives [22.320628580895164]
Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data from the source target.
One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets.
We propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge.
arXiv Detail & Related papers (2024-01-03T14:28:55Z) - Metrics reloaded: Recommendations for image analysis validation [59.60445111432934]
Metrics Reloaded is a comprehensive framework guiding researchers in the problem-aware selection of metrics.
The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint.
Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics.
arXiv Detail & Related papers (2022-06-03T15:56:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.