Similar Document Template Matching Algorithm
- URL: http://arxiv.org/abs/2311.12663v1
- Date: Tue, 21 Nov 2023 15:13:18 GMT
- Title: Similar Document Template Matching Algorithm
- Authors: Harshitha Yenigalla, Bommareddy Revanth Srinivasa Reddy, Batta Venkata
Rahul and Nannapuraju Hemanth Raju
- Abstract summary: This study outlines a comprehensive methodology for verifying medical documents.
It integrates advanced techniques in template extraction, comparison, and fraud detection.
This methodology provides a robust approach to medical document verification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study outlines a comprehensive methodology for verifying medical
documents, integrating advanced techniques in template extraction, comparison,
and fraud detection. It begins with template extraction using sophisticated
region-of-interest (ROI) methods, incorporating contour analysis and edge
identification. Pre-processing steps ensure template clarity through
morphological operations and adaptive thresholding. The template comparison
algorithm utilizes advanced feature matching with key points and descriptors,
enhancing robustness through histogram-based analysis for accounting
variations. Fraud detection involves the SSIM computation and OCR for textual
information extraction. The SSIM quantifies structural similarity, aiding in
potential match identification. OCR focuses on critical areas like patient
details, provider information, and billing amounts. Extracted information is
compared with a reference dataset, and confidence thresholding ensures reliable
fraud detection. Adaptive parameters enhance system flexibility for dynamic
adjustments to varying document layouts. This methodology provides a robust
approach to medical document verification, addressing complexities in template
extraction, comparison, fraud detection, and adaptability to diverse document
structures.
Related papers
- Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods [0.0]
In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies.
A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs.
The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance.
arXiv Detail & Related papers (2024-09-13T02:08:47Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - Detecting automatically the layout of clinical documents to enhance the
performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z) - Identity Documents Authentication based on Forgery Detection of
Guilloche Pattern [2.606834301724095]
An authentication model for identity documents based on forgery detection of guilloche patterns is proposed.
Experiments are conducted in order to analyze and identify the most proper parameters to achieve higher authentication performance.
arXiv Detail & Related papers (2022-06-22T11:37:10Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Automating Document Classification with Distant Supervision to Increase
the Efficiency of Systematic Reviews [18.33687903724145]
Well-done systematic reviews are expensive, time-demanding, and labor-intensive.
We propose an automatic document classification approach to significantly reduce the effort in reviewing documents.
arXiv Detail & Related papers (2020-12-09T22:45:40Z) - Extracting Procedural Knowledge from Technical Documents [1.0773368566852943]
Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation.
It is a challenging problem to parse big dense documents like product manuals, user guides to automatically understand which parts are talking about procedures and subsequently extract them.
arXiv Detail & Related papers (2020-10-20T09:47:52Z) - OCR Graph Features for Manipulation Detection in Documents [11.193867567895353]
We propose a model that leverages graph features using OCR (Optical Character Recognition)
Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features.
We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections.
arXiv Detail & Related papers (2020-09-10T21:50:45Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.