Character Region Attention For Text Spotting
- URL: http://arxiv.org/abs/2007.09629v1
- Date: Sun, 19 Jul 2020 09:12:23 GMT
- Title: Character Region Attention For Text Spotting
- Authors: Youngmin Baek, Seung Shin, Jeonghun Baek, Sungrae Park, Junyeop Lee,
Daehyun Nam, Hwalsuk Lee
- Abstract summary: A scene text spotter is composed of text detection and recognition modules.
A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature.
This is possible since the two modules share a common sub-task which is to find the location of the character regions.
This architecture is formed by utilizing detection outputs in the recognizer and propagating the recognition loss through the detection stage.
- Score: 18.713194210876136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A scene text spotter is composed of text detection and recognition modules.
Many studies have been conducted to unify these modules into an end-to-end
trainable model to achieve better performance. A typical architecture places
detection and recognition modules into separate branches, and a RoI pooling is
commonly used to let the branches share a visual feature. However, there still
exists a chance of establishing a more complimentary connection between the
modules when adopting recognizer that uses attention-based decoder and detector
that represents spatial information of the character regions. This is possible
since the two modules share a common sub-task which is to find the location of
the character regions. Based on the insight, we construct a tightly coupled
single pipeline model. This architecture is formed by utilizing detection
outputs in the recognizer and propagating the recognition loss through the
detection stage. The use of character score map helps the recognizer attend
better to the character center points, and the recognition loss propagation to
the detector module enhances the localization of the character regions. Also, a
strengthened sharing stage allows feature rectification and boundary
localization of arbitrary-shaped text regions. Extensive experiments
demonstrate state-of-the-art performance in publicly available straight and
curved benchmark dataset.
Related papers
- LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model [20.007650672107566]
Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos.
Recent methods track the zero-shot results of state-of-the-art image text spotters directly.
Fine-tuning transformer-based text spotters on specific datasets could yield performance enhancements.
arXiv Detail & Related papers (2024-05-29T15:35:09Z) - Local Feature Matching Using Deep Learning: A Survey [19.322545965903608]
Local feature matching enjoys wide-ranging applications in the realm of computer vision, encompassing domains such as image retrieval, 3D reconstruction, and object recognition.
In recent years, the introduction of deep learning models has sparked widespread exploration into local feature matching techniques.
The paper also explores the practical application of local feature matching in diverse domains such as Structure from Motion, Remote Sensing Image Registration, and Medical Image Registration.
arXiv Detail & Related papers (2024-01-31T04:32:41Z) - Weakly-supervised deepfake localization in diffusion-generated images [4.548755617115687]
We propose a weakly-supervised localization problem based on the Xception network as the backbone architecture.
We show that the best performing detection method (based on local scores) is less sensitive to the looser supervision than to the mismatch in terms of dataset or generator.
arXiv Detail & Related papers (2023-11-08T10:27:36Z) - From Global to Local: Multi-scale Out-of-distribution Detection [129.37607313927458]
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.
Recent progress in representation learning gives rise to distance-based OOD detection.
We propose Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details.
arXiv Detail & Related papers (2023-08-20T11:56:25Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - DQnet: Cross-Model Detail Querying for Camouflaged Object Detection [54.82390534024954]
A convolutional neural network (CNN) for camouflaged object detection tends to activate local discriminative regions while ignoring complete object extent.
In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN.
In order to obtain feature maps that could activate full object extent, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed.
arXiv Detail & Related papers (2022-12-16T06:23:58Z) - PandA: Unsupervised Learning of Parts and Appearances in the Feature
Maps of GANs [34.145110544546114]
We present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion.
Our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control.
arXiv Detail & Related papers (2022-05-31T18:28:39Z) - Guide Local Feature Matching by Overlap Estimation [9.387323456222823]
We introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR.
OETR performs overlap estimation in a two-step process of feature correlation and then overlap regression.
Experiments show that OETR can boost state-of-the-art local feature matching performance substantially.
arXiv Detail & Related papers (2022-02-18T07:11:36Z) - Point-Level Region Contrast for Object Detection Pre-Training [147.47349344401806]
We present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
Our approach performs contrastive learning by directly sampling individual point pairs from different regions.
Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
arXiv Detail & Related papers (2022-02-09T18:56:41Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - Local Relation Learning for Face Forgery Detection [73.73130683091154]
We propose a novel perspective of face forgery detection via local relation learning.
Specifically, we propose a Multi-scale Patch Similarity Module (MPSM), which measures the similarity between features of local regions.
We also propose an RGB-Frequency Attention Module (RFAM) to fuse information in both RGB and frequency domains for more comprehensive local feature representation.
arXiv Detail & Related papers (2021-05-06T10:44:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.