Related papers: OCR Graph Features for Manipulation Detection in Documents

OCR Graph Features for Manipulation Detection in Documents

URL: http://arxiv.org/abs/2009.05158v2
Date: Mon, 14 Sep 2020 15:52:09 GMT
Title: OCR Graph Features for Manipulation Detection in Documents
Authors: Hailey James, Otkrist Gupta, Dan Raviv
Abstract summary: We propose a model that leverages graph features using OCR (Optical Character Recognition) Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections.
Score: 11.193867567895353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task.

Related papers

Leveraging Contrastive Learning for a Similarity-Guided Tampered Document Data Generation Pipeline [6.066442015301665]
We propose a novel method for generating high-quality tampered document images.<n>We first train an auxiliary network to compare text crops, leveraging contrastive learning with a novel strategy for defining positive pairs and their corresponding negatives.<n>Using a carefully designed generation pipeline, we introduce a framework capable of producing diverse, high-quality tampered document images.
arXiv Detail & Related papers (2026-02-19T12:39:38Z)
Exploring Light-Weight Object Recognition for Real-Time Document Detection [1.623310884498926]
Real-time document detection and rectification is a niche that is largely unexplored by the literature.<n>We adapt IWPOD-Net, a license plate detection network, and train it for detection on NBID, a synthetic ID card dataset.<n>We show that our model is smaller and more efficient than current state-of-the-art solutions while retaining a competitive OCR quality metric.
arXiv Detail & Related papers (2025-09-07T23:58:28Z)
Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation [6.385732495789276]
Document alignment plays a crucial role in numerous real-world applications, such as automated form processing, anomaly detection, and workflow automation.<n>Traditional methods for document alignment rely on image-based features like keypoints, edges, and textures to estimate geometric transformations, such as homographies.<n>This paper introduces a novel approach that leverages Optical Character Recognition (OCR) outputs as features for homography estimation.
arXiv Detail & Related papers (2025-05-25T01:20:32Z)
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective [54.605073936695575]
Graph anomaly detection aims to identify unusual patterns in graph-based data, with wide applications in fields such as web security and financial fraud detection.<n>Existing methods rely on contrastive learning, assuming that a lower similarity between a node and its local subgraph indicates abnormality.<n>The presence of interfering edges invalidates this assumption, since it introduces disruptive noise that compromises the contrastive learning process.<n>We propose a Clean-View Enhanced Graph Anomaly Detection framework (CVGAD), which includes a multi-scale anomaly awareness module to identify key sources of interference in the contrastive learning process.
arXiv Detail & Related papers (2025-05-23T15:05:56Z)
Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR) The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z)
Similar Document Template Matching Algorithm [0.0]
This study outlines a comprehensive methodology for verifying medical documents. It integrates advanced techniques in template extraction, comparison, and fraud detection. This methodology provides a robust approach to medical document verification.
arXiv Detail & Related papers (2023-11-21T15:13:18Z)
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models [3.3813766129849845]
We propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation.
arXiv Detail & Related papers (2023-11-16T07:16:02Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z)
ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms. We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance. Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z)
Augraphy: A Data Augmentation Library for Document Images [59.457999432618614]
Augraphy is a Python library for constructing data augmentation pipelines. It provides strategies to produce augmented versions of clean document images that appear to have been altered by standard office operations.
arXiv Detail & Related papers (2022-08-30T22:36:19Z)
ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations. We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
DocScanner: Robust Document Image Rectification with Progressive Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification. DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z)
One-shot Key Information Extraction from Document with Deep Partial Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios. Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
Key Information Extraction From Documents: Evaluation And Generator [3.878105750489656]
This research project compares state-of-the-art models for information extraction from documents. The results have shown that NLP based pre-processing is beneficial for model performance. The use of a bounding box regression decoder increases the model performance only for fields that do not follow a rectangular shape.
arXiv Detail & Related papers (2021-06-09T16:12:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.