Related papers: GDB: Gated convolutions-based Document Binarization

GDB: Gated convolutions-based Document Binarization

URL: http://arxiv.org/abs/2302.02073v1
Date: Sat, 4 Feb 2023 02:56:40 GMT
Title: GDB: Gated convolutions-based Document Binarization
Authors: Zongyuan Yang, Yongping Xiong, Guibin Wu
Abstract summary: We formulate text extraction as the learning of gating values and propose an end-to-end gated convolutions-based network (GDB) to solve the problem of imprecise stroke edge extraction. Our proposed framework consists of two stages. Firstly, a coarse sub-network with an extra edge branch is trained to get more precise feature maps by feeding a priori mask and edge. Secondly, a refinement sub-network is cascaded to refine the output of the first stage by gated convolutions based on the sharp edge.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document binarization is a key pre-processing step for many document analysis tasks. However, existing methods can not extract stroke edges finely, mainly due to the fair-treatment nature of vanilla convolutions and the extraction of stroke edges without adequate supervision by boundary-related information. In this paper, we formulate text extraction as the learning of gating values and propose an end-to-end gated convolutions-based network (GDB) to solve the problem of imprecise stroke edge extraction. The gated convolutions are applied to selectively extract the features of strokes with different attention. Our proposed framework consists of two stages. Firstly, a coarse sub-network with an extra edge branch is trained to get more precise feature maps by feeding a priori mask and edge. Secondly, a refinement sub-network is cascaded to refine the output of the first stage by gated convolutions based on the sharp edge. For global information, GDB also contains a multi-scale operation to combine local and global features. We conduct comprehensive experiments on ten Document Image Binarization Contest (DIBCO) datasets from 2009 to 2019. Experimental results show that our proposed methods outperform the state-of-the-art methods in terms of all metrics on average and achieve top ranking on six benchmark datasets.

Related papers

P2Object: Single Point Supervised Object Detection and Instance Segmentation [58.778288785355]
We introduce Point-to-Box Network (P2BNet), which constructs balanced textbftextitinstance-level proposal bags<n>P2MNet can generate more precise bounding boxes and generalize to segmentation tasks.<n>Our method largely surpasses the previous methods in terms of the mean average precision on COCO, VOC, and Cityscapes.
arXiv Detail & Related papers (2025-04-10T14:51:08Z)
SuperEdge: Towards a Generalization Model for Self-Supervised Edge Detection [2.912976132828368]
State-of-the-art pixel-wise annotations are labor-intensive and subject to inconsistencies when acquired manually. We propose a novel self-supervised approach for edge detection that employs a multi-level, multi-homography technique to transfer annotations from synthetic to real-world datasets. Our method eliminates the dependency on manual annotated edge labels, thereby enhancing its generalizability across diverse datasets.
arXiv Detail & Related papers (2024-01-04T15:21:53Z)
Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation [34.26170741722835]
We propose an end-to-end architecture that compensates for and identifies partial point clouds on the fly. hierarchical self-distillation (HSD) can be applied to arbitrary hierarchy-based point cloud methods.
arXiv Detail & Related papers (2023-12-28T08:51:04Z)
Morphologically-Aware Consensus Computation via Heuristics-based IterATive Optimization (MACCHIatO) [1.8749305679160362]
We propose a new method to construct a binary or a probabilistic consensus segmentation based on the Fr'echet means of carefully chosen distances. We show that it leads to binary consensus masks of intermediate size between Majority Voting and STAPLE and to different posterior probabilities than Mask Averaging and STAPLE methods.
arXiv Detail & Related papers (2023-09-14T23:28:58Z)
BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex Prediction [43.61580149432732]
BiSVP is a refinement-free and end-to-end building footprint extraction method. We propose a cross-scale feature fusion (CSFF) module to facilitate high resolution and rich semantic feature learning. Our BiSVP outperforms state-of-the-art methods by considerable margins on three building instance segmentation benchmarks.
arXiv Detail & Related papers (2023-03-01T07:50:34Z)
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations. DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals. We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z)
General Cutting Planes for Bound-Propagation-Based Neural Network Verification [144.7290035694459]
We generalize the bound propagation procedure to allow the addition of arbitrary cutting plane constraints. We find that MIP solvers can generate high-quality cutting planes for strengthening bound-propagation-based verifiers. Our method is the first verifier that can completely solve the oval20 benchmark and verify twice as many instances on the oval21 benchmark.
arXiv Detail & Related papers (2022-08-11T10:31:28Z)
Beyond the Prototype: Divide-and-conquer Proxies for Few-shot Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples. We propose a simple yet versatile framework in the spirit of divide-and-conquer. Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z)
Copy-Move Image Forgery Detection Based on Evolving Circular Domains Coverage [5.716030416222748]
The proposed scheme integrates both block-based and keypoint-based forgery detection methods. The experimental results indicate that the proposed CMFD scheme can achieve better detection performance under various attacks.
arXiv Detail & Related papers (2021-09-09T16:08:03Z)
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2) Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
Boundary-assisted Region Proposal Networks for Nucleus Segmentation [89.69059532088129]
Machine learning models cannot perform well because of large amount of crowded nuclei. We devise a Boundary-assisted Region Proposal Network (BRP-Net) that achieves robust instance-level nucleus segmentation.
arXiv Detail & Related papers (2020-06-04T08:26:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.