DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
- URL: http://arxiv.org/abs/2208.09878v1
- Date: Sun, 21 Aug 2022 12:58:45 GMT
- Title: DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
- Authors: Jingyu Lin, Jie Jiang, Yan Yan, Chunchao Guo, Hongfa Wang, Wei Liu,
Hanzi Wang
- Abstract summary: We propose DPTNet, a simple yet effective architecture to model the global and local information for the scene text detection task.
We propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path.
Our DPTNet achieves state-of-the-art results on the MSRA-TD500 dataset, and provides competitive results on other standard benchmarks in terms of both detection accuracy and speed.
- Score: 34.42038300372715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prosperity of deep learning contributes to the rapid progress in scene
text detection. Among all the methods with convolutional networks,
segmentation-based ones have drawn extensive attention due to their superiority
in detecting text instances of arbitrary shapes and extreme aspect ratios.
However, the bottom-up methods are limited to the performance of their
segmentation models. In this paper, we propose DPTNet (Dual-Path Transformer
Network), a simple yet effective architecture to model the global and local
information for the scene text detection task. We further propose a parallel
design that integrates the convolutional network with a powerful self-attention
mechanism to provide complementary clues between the attention path and
convolutional path. Moreover, a bi-directional interaction module across the
two paths is developed to provide complementary clues in the channel and
spatial dimensions. We also upgrade the concentration operation by adding an
extra multi-head attention layer to it. Our DPTNet achieves state-of-the-art
results on the MSRA-TD500 dataset, and provides competitive results on other
standard benchmarks in terms of both detection accuracy and speed.
Related papers
- Image Captioning via Dynamic Path Customization [100.15412641586525]
We propose a novel Dynamic Transformer Network (DTNet) for image captioning, which dynamically assigns customized paths to different samples, leading to discriminative yet accurate captions.
To validate the effectiveness of our proposed DTNet, we conduct extensive experiments on the MS-COCO dataset and achieve new state-of-the-art performance.
arXiv Detail & Related papers (2024-06-01T07:23:21Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for
Mobile Robots [17.90723909170376]
We introduce Mobile-Seed, a lightweight framework for simultaneous semantic segmentation and boundary detection.
Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach.
Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline.
arXiv Detail & Related papers (2023-11-21T14:53:02Z) - PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z) - Road detection via a dual-task network based on cross-layer graph fusion
modules [2.8197257696982287]
We propose a dual-task network (DTnet) for road detection and cross-layer graph fusion module (CGM)
CGM improves the cross-layer fusion effect by a complex feature stream graph, and four graph patterns are evaluated.
arXiv Detail & Related papers (2022-08-17T07:16:55Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Attentional Local Contrast Networks for Infrared Small Target Detection [15.882749652217653]
We propose a novel model-driven deep network for infrared small target detection.
We modularize a conventional local contrast measure method as a depth-wise parameterless nonlinear feature refinement layer in an end-to-end network.
We conduct detailed ablation studies with varying network depths to empirically verify the effectiveness and efficiency of each component in our network architecture.
arXiv Detail & Related papers (2020-12-15T19:33:09Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.