Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints
under Polar Representation
- URL: http://arxiv.org/abs/2312.07925v1
- Date: Wed, 13 Dec 2023 06:50:30 GMT
- Title: Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints
under Polar Representation
- Authors: Weiguang Zhang, Qiufeng Wang, Kaizhu Huang
- Abstract summary: Document dewarping aims to eliminate geometric deformation in photographed documents to benefit text recognition.
In this work, we explore Polar coordinates representation for each point in document dewarping, namely Polar-Doc.
We propose a novel multi-scope Polar-Doc-IOU loss to constrain the relationship among control points as a grid-based regularization.
- Score: 26.050987382098107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document dewarping, aiming to eliminate geometric deformation in photographed
documents to benefit text recognition, has made great progress in recent years
but is still far from being solved. While Cartesian coordinates are typically
leveraged by state-of-the-art approaches to learn a group of deformation
control points, such representation is not efficient for dewarping model to
learn the deformation information. In this work, we explore Polar coordinates
representation for each point in document dewarping, namely Polar-Doc. In
contrast to most current works adopting a two-stage pipeline typically, Polar
representation enables a unified point regression framework for both
segmentation and dewarping network in one single stage. Such unification makes
the whole model more efficient to learn under an end-to-end optimization
pipeline, and also obtains a compact representation. Furthermore, we propose a
novel multi-scope Polar-Doc-IOU loss to constrain the relationship among
control points as a grid-based regularization under the Polar representation.
Visual comparisons and quantitative experiments on two benchmarks show that,
with much fewer parameters than the other mainstream counterparts, our
one-stage model with multi-scope constraints achieves new state-of-the-art
performance on both pixel alignment metrics and OCR metrics. Source codes will
be available at \url{*****}.
Related papers
- PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers [7.4774909520731425]
We show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of constraints.
In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work.
arXiv Detail & Related papers (2024-07-05T14:24:37Z) - Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond [84.56978780892783]
We propose CoupledTPS, which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation.
In light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data.
Experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art solutions for rotation correction.
arXiv Detail & Related papers (2024-01-24T13:03:28Z) - PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection [81.16859686137435]
We present PARTNER, a novel 3D object detector in the polar coordinate.
Our method outperforms the previous polar-based works with remarkable margins of 3.68% and 9.15% on and ONCE validation set.
arXiv Detail & Related papers (2023-08-08T01:59:20Z) - Towards Few-shot Entity Recognition in Document Images: A Graph Neural
Network Approach Robust to Image Manipulation [38.09501948846373]
We introduce the topological adjacency relationship among the tokens, emphasizing their relative position information.
We incorporate these graphs into the pre-trained language model by adding graph neural network layers on top of the language model embeddings.
Experiments on two benchmark datasets show that LAGER significantly outperforms strong baselines under different few-shot settings.
arXiv Detail & Related papers (2023-05-24T07:34:33Z) - Interpolation-based Correlation Reduction Network for Semi-Supervised
Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN)
In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries.
By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - PolarMask++: Enhanced Polar Representation for Single-Shot Instance
Segmentation and Beyond [47.518550130850755]
PolarMask reformulates the instance segmentation problem as predicting the contours of objects in the polar coordinate.
Two modules are carefully designed (i.e. soft polar centerness and polar IoU loss) to sample high-quality center examples.
PolarMask is fully convolutional and can be easily embedded into most off-the-shelf detection methods.
arXiv Detail & Related papers (2021-05-05T16:55:53Z) - Community Detection in General Hypergraph via Graph Embedding [1.4213973379473654]
We propose a novel method for detecting community structure in general hypergraph networks, uniform or non-uniform.
The proposed method introduces a null to augment a non-uniform hypergraph into a uniform multi-hypergraph, and then embeds the multi-hypergraph in a low-dimensional vector space.
arXiv Detail & Related papers (2021-03-28T03:23:03Z) - Learning multiview 3D point cloud registration [74.39499501822682]
We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm.
Our approach outperforms the state-of-the-art by a significant margin, while being end-to-end trainable and computationally less costly.
arXiv Detail & Related papers (2020-01-15T03:42:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.