CPN: Complementary Proposal Network for Unconstrained Text Detection
- URL: http://arxiv.org/abs/2402.11540v1
- Date: Sun, 18 Feb 2024 10:43:53 GMT
- Title: CPN: Complementary Proposal Network for Unconstrained Text Detection
- Authors: Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong
- Abstract summary: We propose a Complementary Proposal Network that seamlessly integrates semantic and geometric information for superior performance.
By leveraging both complementary proposals and features, CPN outperforms state-of-the-art approaches with significant margins under comparable cost.
- Score: 7.524080426954018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for scene text detection can be divided into two paradigms:
segmentation-based and anchor-based. While Segmentation-based methods are
well-suited for irregular shapes, they struggle with compact or overlapping
layouts. Conversely, anchor-based approaches excel for complex layouts but
suffer from irregular shapes. To strengthen their merits and overcome their
respective demerits, we propose a Complementary Proposal Network (CPN) that
seamlessly and parallelly integrates semantic and geometric information for
superior performance. The CPN comprises two efficient networks for proposal
generation: the Deformable Morphology Semantic Network, which generates
semantic proposals employing an innovative deformable morphological operator,
and the Balanced Region Proposal Network, which produces geometric proposals
with pre-defined anchors. To further enhance the complementarity, we introduce
an Interleaved Feature Attention module that enables semantic and geometric
features to interact deeply before proposal generation. By leveraging both
complementary proposals and features, CPN outperforms state-of-the-art
approaches with significant margins under comparable computation cost.
Specifically, our approach achieves improvements of 3.6%, 1.3% and 1.0% on
challenging benchmarks ICDAR19-ArT, IC15, and MSRA-TD500, respectively. Code
for our method will be released.
Related papers
- Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object
Detection [114.54835359657707]
ProposalContrast is an unsupervised point cloud pre-training framework.
It learns robust 3D representations by contrasting region proposals.
ProposalContrast is verified on various 3D detectors.
arXiv Detail & Related papers (2022-07-26T04:45:49Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Contrastive Proposal Extension with LSTM Network for Weakly Supervised
Object Detection [52.86681130880647]
Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs.
We propose a new method by comparing the initial proposals and the extension ones to optimize those initial proposals.
Experiments on PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has achieved the state-of-the-art results.
arXiv Detail & Related papers (2021-10-14T16:31:57Z) - Adaptive Proposal Generation Network for Temporal Sentence Localization
in Videos [58.83440885457272]
We address the problem of temporal sentence localization in videos (TSLV)
Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals.
We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
arXiv Detail & Related papers (2021-09-14T02:02:36Z) - U-mesh: Human Correspondence Matching with Mesh Convolutional Networks [15.828285556159026]
We propose an elegant fusion of regression (bottom-up) and generative (top-down) methods to fit a parametric template model to raw scan meshes.
Our first major contribution is an intrinsic convolutional mesh U-net architecture that predicts pointwise correspondence to a template surface.
We evaluate the proposed method on the FAUST correspondence challenge where we achieve 20% (33%) improvement over state of the art methods for inter- (intra-) subject correspondence.
arXiv Detail & Related papers (2021-08-15T08:58:45Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z) - Deep-3DAligner: Unsupervised 3D Point Set Registration Network With
Optimizable Latent Vector [15.900382629390297]
We propose to develop a novel model that integrates the optimization to learning, aiming to address the technical challenges in 3D registration.
In addition to the deep transformation decoding network, our framework introduce an optimizable deep underlineSpatial underlineCorrelation underlineRepresentation.
arXiv Detail & Related papers (2020-09-29T22:44:38Z) - Spatial-Scale Aligned Network for Fine-Grained Recognition [42.71878867504503]
Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations.
We propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process.
arXiv Detail & Related papers (2020-01-05T11:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.