EdgeDoc: Hybrid CNN-Transformer Model for Accurate Forgery Detection and Localization in ID Documents
- URL: http://arxiv.org/abs/2508.16284v1
- Date: Fri, 22 Aug 2025 10:45:14 GMT
- Title: EdgeDoc: Hybrid CNN-Transformer Model for Accurate Forgery Detection and Localization in ID Documents
- Authors: Anjith George, Sebastien Marcel,
- Abstract summary: EdgeDoc is a novel approach for the detection and localization of document forgeries.<n>Our architecture combines a lightweight convolutional transformer with auxiliary noiseprint features extracted from the images.
- Score: 6.690084812573466
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The widespread availability of tools for manipulating images and documents has made it increasingly easy to forge digital documents, posing a serious threat to Know Your Customer (KYC) processes and remote onboarding systems. Detecting such forgeries is essential to preserving the integrity and security of these services. In this work, we present EdgeDoc, a novel approach for the detection and localization of document forgeries. Our architecture combines a lightweight convolutional transformer with auxiliary noiseprint features extracted from the images, enhancing its ability to detect subtle manipulations. EdgeDoc achieved third place in the ICCV 2025 DeepID Challenge, demonstrating its competitiveness. Experimental results on the FantasyID dataset show that our method outperforms baseline approaches, highlighting its effectiveness in realworld scenarios. Project page : https://www.idiap. ch/paper/edgedoc/
Related papers
- AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation [35.07704681580893]
We introduce AgenticOCR, a dynamic parsing paradigm that transforms optical character recognition (OCR) into a query-driven, on-demand extraction system.<n>By autonomously analyzing document layout in a "thinking with images" manner, AgenticOCR identifies and selectively recognizes regions of interest.<n>AgenticOCR has the potential to serve as the "third building block" of the visual document RAG stack.
arXiv Detail & Related papers (2026-02-27T16:09:38Z) - SynthID-Image: Image watermarking at internet scale [55.5714762895087]
We introduce SynthID-Image, a deep learning-based system for invisibly watermarking AI-generated imagery.<n>This paper documents the technical desiderata, threat models, and practical challenges of deploying such a system at internet scale.
arXiv Detail & Related papers (2025-10-10T11:03:31Z) - SynID: Passport Synthetic Dataset for Presentation Attack Detection [7.1212970088491385]
Increase is driven by several factors, including the rise of remote work, online purchasing, migration, and advancements in synthetic images.<n>This work proposes a new passport dataset generated from a hybrid method that combines synthetic data and open-access information.
arXiv Detail & Related papers (2025-05-12T13:24:54Z) - Geometry Restoration and Dewarping of Camera-Captured Document Images [0.0]
This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera.<n>Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid.
arXiv Detail & Related papers (2025-01-06T17:12:19Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Controllable Fake Document Infilling for Cyber Deception [31.734574811062053]
We propose a novel model, Fake Document Infilling (FDI), by converting the problem to a controllable mask-then-infill procedure.
FDI outperforms the baselines in generating highly believable fakes with moderate modification to protect critical information and deceive adversaries.
arXiv Detail & Related papers (2022-10-18T14:59:38Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - A Fast Fully Octave Convolutional Neural Network for Document Image
Segmentation [1.8426817621478804]
We investigate a method based on U-Net to detect the document edges and text regions in ID images.
We propose a model optimization based on Octave Convolutions to qualify the method to situations where storage, processing, and time resources are limited.
Our results showed that the proposed models are efficient to document segmentation tasks and portable.
arXiv Detail & Related papers (2020-04-03T00:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.