DocAligner: Annotating Real-world Photographic Document Images by Simply
Taking Pictures
- URL: http://arxiv.org/abs/2306.05749v2
- Date: Mon, 12 Jun 2023 04:03:23 GMT
- Title: DocAligner: Annotating Real-world Photographic Document Images by Simply
Taking Pictures
- Authors: Jiaxin Zhang, Bangdong Chen, Hiuyi Cheng, Fengjun Guo, Kai Ding,
Lianwen Jin
- Abstract summary: We present DocAligner, a novel method that streamlines the manual annotation process to a simple step of taking pictures.
It achieves this by establishing dense correspondence between photographic document images and their clean counterparts.
Considering the distinctive characteristics of document images, DocAligner incorporates several innovative features.
- Score: 24.76258692552673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there has been a growing interest in research concerning document
image analysis and recognition in photographic scenarios. However, the lack of
labeled datasets for this emerging challenge poses a significant obstacle, as
manual annotation can be time-consuming and impractical. To tackle this issue,
we present DocAligner, a novel method that streamlines the manual annotation
process to a simple step of taking pictures. DocAligner achieves this by
establishing dense correspondence between photographic document images and
their clean counterparts. It enables the automatic transfer of existing
annotations in clean document images to photographic ones and helps to
automatically acquire labels that are unavailable through manual labeling.
Considering the distinctive characteristics of document images, DocAligner
incorporates several innovative features. First, we propose a non-rigid
pre-alignment technique based on the document's edges, which effectively
eliminates interference caused by significant global shifts and repetitive
patterns present in document images. Second, to handle large shifts and ensure
high accuracy, we introduce a hierarchical aligning approach that combines
global and local correlation layers. Furthermore, considering the importance of
fine-grained elements in document images, we present a details recurrent
refinement module to enhance the output in a high-resolution space. To train
DocAligner, we construct a synthetic dataset and introduce a self-supervised
learning approach to enhance its robustness for real-world data. Through
extensive experiments, we demonstrate the effectiveness of DocAligner and the
acquired dataset. Datasets and codes will be publicly available.
Related papers
- DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification [5.247930659596986]
This paper introduces DocXplain, a novel model-agnostic explainability method specifically designed for generating high interpretability feature attribution maps.
We extensively evaluate our proposed approach in the context of document image classification, utilizing 4 different evaluation metrics.
To the best of the authors' knowledge, this work presents the first model-agnostic attribution-based explainability method specifically tailored for document images.
arXiv Detail & Related papers (2024-07-04T10:59:15Z) - SelfDocSeg: A Self-Supervised vision-based Approach towards Document
Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community.
With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain.
We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Augraphy: A Data Augmentation Library for Document Images [59.457999432618614]
Augraphy is a Python library for constructing data augmentation pipelines.
It provides strategies to produce augmented versions of clean document images that appear to have been altered by standard office operations.
arXiv Detail & Related papers (2022-08-30T22:36:19Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - DocSynth: A Layout Guided Approach for Controllable Document Image
Synthesis [16.284895792639137]
This paper presents a novel approach, called Doc Synth, to automatically synthesize document images based on a given layout.
In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed Doc Synth model learns to generate a set of realistic document images.
The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.
arXiv Detail & Related papers (2021-07-06T14:24:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.