Related papers: ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement

ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement

URL: http://arxiv.org/abs/2507.16397v1
Date: Tue, 22 Jul 2025 09:48:23 GMT
Title: ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
Authors: Kahim Wong, Jicheng Zhou, Haiwei Wu, Yain-Whar Si, Jiantao Zhou,
Abstract summary: ADCD-Net is a robust document forgery localization model that adaptively leverages the RGB/DCT forensic traces.<n>Our proposed ADCD-Net demonstrates superior forgery localization performance, consistently outperforming state-of-the-art methods by 20.79% averaged over 5 types of distortions.
Score: 18.283496080974924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advancement of image editing tools has enabled malicious manipulation of sensitive document images, underscoring the need for robust document image forgery detection.Though forgery detectors for natural images have been extensively studied, they struggle with document images, as the tampered regions can be seamlessly blended into the uniform document background (BG) and structured text. On the other hand, existing document-specific methods lack sufficient robustness against various degradations, which limits their practical deployment. This paper presents ADCD-Net, a robust document forgery localization model that adaptively leverages the RGB/DCT forensic traces and integrates key characteristics of document images. Specifically, to address the DCT traces' sensitivity to block misalignment, we adaptively modulate the DCT feature contribution based on a predicted alignment score, resulting in much improved resilience to various distortions, including resizing and cropping. Also, a hierarchical content disentanglement approach is proposed to boost the localization performance via mitigating the text-BG disparities. Furthermore, noticing the predominantly pristine nature of BG regions, we construct a pristine prototype capturing traces of untampered regions, and eventually enhance both the localization accuracy and robustness. Our proposed ADCD-Net demonstrates superior forgery localization performance, consistently outperforming state-of-the-art methods by 20.79\% averaged over 5 types of distortions. The code is available at https://github.com/KAHIMWONG/ACDC-Net.

Related papers

DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model [25.504170988714783]
Document dewarping aims to rectify deformations in photographic document images, thus improving text readability.<n>We propose DvD, the first generative model to tackle document textbfDewarping textbfvia a textbfDiffusion framework.
arXiv Detail & Related papers (2025-05-28T05:05:51Z)
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach [69.01456182499486]
textbfBR-Gen is a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations.<n>textbfNFA-ViT is a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries.
arXiv Detail & Related papers (2025-04-16T09:57:23Z)
LIME: Localized Image Editing via Attention Regularization in Diffusion Models [69.33072075580483]
This paper introduces LIME for localized image editing in diffusion models.<n>LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask.<n>We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z)
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM) Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z)
CTP-Net: Character Texture Perception Network for Document Image Forgery Localization [28.48117743313255]
We propose a Character Texture Perception Network (CTP-Net) to localize the forged regions in document images. Considering the characters with semantics in a document image are highly vulnerable, capturing the forgery traces is the key to localize the forged regions. The proposed-Net is able to localize multi-scale forged areas in document images, and outperform the state-of-the-art forgery localization methods.
arXiv Detail & Related papers (2023-08-04T06:37:28Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
DocMAE: Document Image Rectification via Self-supervised Representation Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification. We first mask random patches of the background-excluded document images and then reconstruct the missing pixels. With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z)
Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification. We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z)
Document Image Binarization in JPEG Compressed Domain using Dual Discriminator Generative Adversarial Networks [0.0]
The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres. The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
arXiv Detail & Related papers (2022-09-13T12:07:32Z)
ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations. We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network [30.18238229156996]
We propose a framework for both rectifying distorted document image and removing background finely, using a fully convolutional network (FCN) The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization. Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
arXiv Detail & Related papers (2021-04-14T12:32:36Z)
DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement [4.073826298938431]
We propose an end-to-end framework named Document Enhancement Geneversarative Adrial Networks (DE-GAN) to restore severely degraded document images. We demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an enhanced version of the degraded document with a high quality.
arXiv Detail & Related papers (2020-10-17T10:54:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.