GDB: Gated convolutions-based Document Binarization
- URL: http://arxiv.org/abs/2302.02073v1
- Date: Sat, 4 Feb 2023 02:56:40 GMT
- Title: GDB: Gated convolutions-based Document Binarization
- Authors: Zongyuan Yang, Yongping Xiong, Guibin Wu
- Abstract summary: We formulate text extraction as the learning of gating values and propose an end-to-end gated convolutions-based network (GDB) to solve the problem of imprecise stroke edge extraction.
Our proposed framework consists of two stages. Firstly, a coarse sub-network with an extra edge branch is trained to get more precise feature maps by feeding a priori mask and edge.
Secondly, a refinement sub-network is cascaded to refine the output of the first stage by gated convolutions based on the sharp edge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document binarization is a key pre-processing step for many document analysis
tasks. However, existing methods can not extract stroke edges finely, mainly
due to the fair-treatment nature of vanilla convolutions and the extraction of
stroke edges without adequate supervision by boundary-related information. In
this paper, we formulate text extraction as the learning of gating values and
propose an end-to-end gated convolutions-based network (GDB) to solve the
problem of imprecise stroke edge extraction. The gated convolutions are applied
to selectively extract the features of strokes with different attention. Our
proposed framework consists of two stages. Firstly, a coarse sub-network with
an extra edge branch is trained to get more precise feature maps by feeding a
priori mask and edge. Secondly, a refinement sub-network is cascaded to refine
the output of the first stage by gated convolutions based on the sharp edge.
For global information, GDB also contains a multi-scale operation to combine
local and global features. We conduct comprehensive experiments on ten Document
Image Binarization Contest (DIBCO) datasets from 2009 to 2019. Experimental
results show that our proposed methods outperform the state-of-the-art methods
in terms of all metrics on average and achieve top ranking on six benchmark
datasets.
Related papers
- SuperEdge: Towards a Generalization Model for Self-Supervised Edge
Detection [2.912976132828368]
State-of-the-art pixel-wise annotations are labor-intensive and subject to inconsistencies when acquired manually.
We propose a novel self-supervised approach for edge detection that employs a multi-level, multi-homography technique to transfer annotations from synthetic to real-world datasets.
Our method eliminates the dependency on manual annotated edge labels, thereby enhancing its generalizability across diverse datasets.
arXiv Detail & Related papers (2024-01-04T15:21:53Z) - Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation [34.26170741722835]
We propose an end-to-end architecture that compensates for and identifies partial point clouds on the fly.
hierarchical self-distillation (HSD) can be applied to arbitrary hierarchy-based point cloud methods.
arXiv Detail & Related papers (2023-12-28T08:51:04Z) - Morphologically-Aware Consensus Computation via Heuristics-based
IterATive Optimization (MACCHIatO) [1.8749305679160362]
We propose a new method to construct a binary or a probabilistic consensus segmentation based on the Fr'echet means of carefully chosen distances.
We show that it leads to binary consensus masks of intermediate size between Majority Voting and STAPLE and to different posterior probabilities than Mask Averaging and STAPLE methods.
arXiv Detail & Related papers (2023-09-14T23:28:58Z) - BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex
Prediction [43.61580149432732]
BiSVP is a refinement-free and end-to-end building footprint extraction method.
We propose a cross-scale feature fusion (CSFF) module to facilitate high resolution and rich semantic feature learning.
Our BiSVP outperforms state-of-the-art methods by considerable margins on three building instance segmentation benchmarks.
arXiv Detail & Related papers (2023-03-01T07:50:34Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - General Cutting Planes for Bound-Propagation-Based Neural Network
Verification [144.7290035694459]
We generalize the bound propagation procedure to allow the addition of arbitrary cutting plane constraints.
We find that MIP solvers can generate high-quality cutting planes for strengthening bound-propagation-based verifiers.
Our method is the first verifier that can completely solve the oval20 benchmark and verify twice as many instances on the oval21 benchmark.
arXiv Detail & Related papers (2022-08-11T10:31:28Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - Copy-Move Image Forgery Detection Based on Evolving Circular Domains
Coverage [5.716030416222748]
The proposed scheme integrates both block-based and keypoint-based forgery detection methods.
The experimental results indicate that the proposed CMFD scheme can achieve better detection performance under various attacks.
arXiv Detail & Related papers (2021-09-09T16:08:03Z) - ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework.
Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2)
Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation.
Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z) - Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning.
We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations.
We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z) - Boundary-assisted Region Proposal Networks for Nucleus Segmentation [89.69059532088129]
Machine learning models cannot perform well because of large amount of crowded nuclei.
We devise a Boundary-assisted Region Proposal Network (BRP-Net) that achieves robust instance-level nucleus segmentation.
arXiv Detail & Related papers (2020-06-04T08:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.