A Fast Fully Octave Convolutional Neural Network for Document Image
Segmentation
- URL: http://arxiv.org/abs/2004.01317v1
- Date: Fri, 3 Apr 2020 00:57:33 GMT
- Title: A Fast Fully Octave Convolutional Neural Network for Document Image
Segmentation
- Authors: Ricardo Batista das Neves Junior, Luiz Felipe Ver\c{c}osa, David
Mac\^edo, Byron Leite Dantas Bezerra, Cleber Zanchettin
- Abstract summary: We investigate a method based on U-Net to detect the document edges and text regions in ID images.
We propose a model optimization based on Octave Convolutions to qualify the method to situations where storage, processing, and time resources are limited.
Our results showed that the proposed models are efficient to document segmentation tasks and portable.
- Score: 1.8426817621478804
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Know Your Customer (KYC) and Anti Money Laundering (AML) are worldwide
practices to online customer identification based on personal identification
documents, similarity and liveness checking, and proof of address. To answer
the basic regulation question: are you whom you say you are? The customer needs
to upload valid identification documents (ID). This task imposes some
computational challenges since these documents are diverse, may present
different and complex backgrounds, some occlusion, partial rotation, poor
quality, or damage. Advanced text and document segmentation algorithms were
used to process the ID images. In this context, we investigated a method based
on U-Net to detect the document edges and text regions in ID images. Besides
the promising results on image segmentation, the U-Net based approach is
computationally expensive for a real application, since the image segmentation
is a customer device task. We propose a model optimization based on Octave
Convolutions to qualify the method to situations where storage, processing, and
time resources are limited, such as in mobile and robotic applications. We
conducted the evaluation experiments in two new datasets CDPhotoDataset and
DTDDataset, which are composed of real ID images of Brazilian documents. Our
results showed that the proposed models are efficient to document segmentation
tasks and portable.
Related papers
- LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification [15.616232457341097]
We call this "image-based automated fact verification," a name that originated from a text-based fact-checking system used by journalists.
We present a large-scale dataset tailored for this new task that features various hand-crafted image edits and machine learning-driven manipulations.
arXiv Detail & Related papers (2024-07-26T09:15:29Z) - A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Zero-Shot In-Distribution Detection in Multi-Object Settings Using
Vision-Language Foundation Models [37.36999826208225]
In this paper, we propose a novel problem setting called zero-shot in-distribution (ID) detection.
We identify images containing ID objects as ID images (even if they contain OOD objects) and images lacking ID objects as OOD images without any training.
We present a simple and effective approach, Global-Local Concept Matching, based on both global and local visual-text alignments of CLIP features.
arXiv Detail & Related papers (2023-04-10T11:35:42Z) - Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics [58.720142291102135]
We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps.
We first scrape images for the objects of interest from popular image search engines.
We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
arXiv Detail & Related papers (2022-10-18T12:49:04Z) - ALADIN: Distilling Fine-grained Alignment Scores for Efficient
Image-Text Matching and Retrieval [51.588385824875886]
Cross-modal retrieval consists in finding images related to a given query text or vice-versa.
Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks.
This paper proposes an ALign And DIstill Network (ALADIN) to fill in the gap between effectiveness and efficiency.
arXiv Detail & Related papers (2022-07-29T16:01:48Z) - DocSegTr: An Instance-Level End-to-End Document Image Segmentation
Transformer [16.03084865625318]
Business intelligence processes often require the extraction of useful semantic content from documents.
We present a transformer-based model for end-to-end segmentation of complex layouts in document images.
Our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-27T10:50:22Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - ICDAR 2021 Competition on Components Segmentation Task of Document
Photos [63.289361617237944]
Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset.
The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced.
Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks.
arXiv Detail & Related papers (2021-06-16T00:49:58Z) - Unsupervised Neural Domain Adaptation for Document Image Binarization [13.848843012433187]
This paper proposes a method that combines neural networks and Domain Adaptation (DA) in order to carry out unsupervised document binarization.
Results show that our proposal successfully deals with the binarization of new document domains without the need for labeled data.
arXiv Detail & Related papers (2020-12-02T13:42:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.