Related papers: CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization

CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization

URL: http://arxiv.org/abs/2508.07413v1
Date: Sun, 10 Aug 2025 16:22:30 GMT
Title: CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization
Authors: Youqi Wang, Shunquan Tan, Rongxuan Peng, Bin Li, Jiwu Huang,
Abstract summary: Increasing accessibility of image editing tools and generative AI has led to a proliferation of visually convincing forgeries.<n>In this paper, we repurpose the mechanism of a state-of-the-art (SOTA) text-to-image synthesis model by exploiting its internal generative process.<n>We propose CLUE, a framework that employs Low- Rank Adaptation (LoRA) to parameter-efficiently reconfigure Stable Diffusion 3 (SD3) as a forensic feature extractor.
Score: 35.73353140683283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing accessibility of image editing tools and generative AI has led to a proliferation of visually convincing forgeries, compromising the authenticity of digital media. In this paper, in addition to leveraging distortions from conventional forgeries, we repurpose the mechanism of a state-of-the-art (SOTA) text-to-image synthesis model by exploiting its internal generative process, turning it into a high-fidelity forgery localization tool. To this end, we propose CLUE (Capture Latent Uncovered Evidence), a framework that employs Low- Rank Adaptation (LoRA) to parameter-efficiently reconfigure Stable Diffusion 3 (SD3) as a forensic feature extractor. Our approach begins with the strategic use of SD3's Rectified Flow (RF) mechanism to inject noise at varying intensities into the latent representation, thereby steering the LoRAtuned denoising process to amplify subtle statistical inconsistencies indicative of a forgery. To complement the latent analysis with high-level semantic context and precise spatial details, our method incorporates contextual features from the image encoder of the Segment Anything Model (SAM), which is parameter-efficiently adapted to better trace the boundaries of forged regions. Extensive evaluations demonstrate CLUE's SOTA generalization performance, significantly outperforming prior methods. Furthermore, CLUE shows superior robustness against common post-processing attacks and Online Social Networks (OSNs). Code is publicly available at https://github.com/SZAISEC/CLUE.

Related papers

StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z)
Implicit Neural Representation-Based Continuous Single Image Super Resolution: An Empirical Study [50.15623093332659]
Implicit neural representation (INR) has become the standard approach for arbitrary-scale image super-resolution (ASSR)<n>We compare existing techniques across diverse settings and present aggregated performance results on multiple image quality metrics.<n>We examine a new loss function that penalizes intensity variations while preserving edges, textures, and finer details during training.
arXiv Detail & Related papers (2026-01-25T07:09:20Z)
OSCAR: Optical-aware Semantic Control for Aleatoric Refinement in Sar-to-Optical Translation [12.055938312320402]
A novel SAR-to-Optical (S2O) translation framework is proposed, integrating three core technical contributions.<n>Experiments demonstrate that the proposed method achieves superior perceptual quality and semantic consistency compared to state-of-the-art approaches.
arXiv Detail & Related papers (2026-01-11T09:57:04Z)
AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation [56.399153019429605]
This work shows that ignoring source dynamics yields inconsistent trajectories that suppress or merge semantic cues.<n>We reformulate text-to-3D optimization as mapping a dynamically evolving source distribution to a fixed target distribution.<n>We introduce AnchorDS, an improved score distillation mechanism that provides state-anchored guidance with image conditions.
arXiv Detail & Related papers (2025-11-12T09:51:23Z)
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z)
SDiFL: Stable Diffusion-Driven Framework for Image Forgery Localization [46.258797633731746]
Existing image forgery localization methods rely on labor-intensive and costly annotated data.<n>We are the first to integrate both image generation and powerful perceptual capabilities of SD into an image forensic framework.<n>Our framework achieves up to 12% improvements in performance on widely used benchmarking datasets.
arXiv Detail & Related papers (2025-08-27T18:02:09Z)
Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection [30.77558600436759]
ARAS is a language-conditioned, auto-regressive anomaly synthesis approach.<n>It injects local, text-specified defects into normal images via token-anchored latent editing.<n>It significantly enhances defect realism, preserves fine-grained material textures, and provides continuous semantic control over synthesized anomalies.
arXiv Detail & Related papers (2025-08-05T15:07:32Z)
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach [69.01456182499486]
textbfBR-Gen is a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations.<n>textbfNFA-ViT is a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries.
arXiv Detail & Related papers (2025-04-16T09:57:23Z)
Locally-Focused Face Representation for Sketch-to-Image Generation Using Noise-Induced Refinement [1.7409266903306055]
This paper presents a novel deep-learning framework that significantly enhances the transformation of rudimentary face sketches into high-fidelity colour images.<n>Our approach effectively captures and enhances critical facial features through a block attention mechanism within an encoder-decoder architecture.<n>The model sets a new state-of-the-art in sketch-to-image generation, can generalize across sketch types, and offers a robust solution for applications such as criminal identification in law enforcement.
arXiv Detail & Related papers (2024-11-28T09:12:56Z)
OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System [7.1083241462091165]
We introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images. A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. Our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.
arXiv Detail & Related papers (2024-03-18T07:41:39Z)
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model [18.25548360119976]
This paper endeavors to advance the precision of snapshot compressive imaging (SCI) reconstruction for multispectral image (MSI) We propose a novel structured zero-shot diffusion model, dubbed DiffSCI. We present extensive testing to show that DiffSCI exhibits discernible performance enhancements over prevailing self-supervised and zero-shot approaches.
arXiv Detail & Related papers (2023-11-19T20:27:14Z)
You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment [45.62136459502005]
We propose a network to perform full reference (FR) and no reference (NR) IQA. We first employ an encoder to extract multi-level features from input images. A Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs. A Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder.
arXiv Detail & Related papers (2023-10-14T11:03:04Z)
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation [50.556961575275345]
We propose a perception-aware fusion framework to promote segmentation robustness in adversarial scenes. We show that our scheme substantially enhances the robustness, with gains of 15.3% mIOU, compared with advanced competitors.
arXiv Detail & Related papers (2023-08-08T01:55:44Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.