Zoom Text Detector
- URL: http://arxiv.org/abs/2209.03014v1
- Date: Wed, 7 Sep 2022 09:19:21 GMT
- Title: Zoom Text Detector
- Authors: Chuang. Yang, Mulin. Chen, Yuan. Yuan, and Qi. Wang
- Abstract summary: Text detectors adopt shrink-mask based representation strategies.
Unfortunately, three disadvantages cause unreliable shrink-masks.
We propose a Zoom Text Detector inspired by the zoom process of the camera.
- Score: 26.761735112547953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To pursue comprehensive performance, recent text detectors improve detection
speed at the expense of accuracy. They adopt shrink-mask based text
representation strategies, which leads to a high dependency of detection
accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable
shrink-masks. Specifically, these methods try to strengthen the discrimination
of shrink-masks from the background by semantic information. However, the
feature defocusing phenomenon that coarse layers are optimized by fine-grained
objectives limits the extraction of semantic features. Meanwhile, since both
shrink-masks and the margins belong to texts, the detail loss phenomenon that
the margins are ignored hinders the distinguishment of shrink-masks from the
margins, which causes ambiguous shrink-mask edges. Moreover, false-positive
samples enjoy similar visual features with shrink-masks. They aggravate the
decline of shrink-masks recognition. To avoid the above problems, we propose a
Zoom Text Detector (ZTD) inspired by the zoom process of the camera.
Specifically, Zoom Out Module (ZOM) is introduced to provide coarse-grained
optimization objectives for coarse layers to avoid feature defocusing.
Meanwhile, Zoom In Module (ZIM) is presented to enhance the margins recognition
to prevent detail loss. Furthermore, Sequential-Visual Discriminator (SVD) is
designed to suppress false-positive samples by sequential and visual features.
Experiments verify the superior comprehensive performance of ZTD.
Related papers
- Multitask Learning for SAR Ship Detection with Gaussian-Mask Joint Segmentation [20.540873039361102]
This paper proposes a multitask learning framework for SAR ship detection, consisting of object detection, speckle suppression, and target segmentation tasks.
An angle classification loss with aspect ratio weighting is introduced to improve detection accuracy by addressing angular periodicity and object proportions.
The speckle suppression task uses a dual-feature fusion attention mechanism to reduce noise and fuse shallow and denoising features, enhancing robustness.
The target segmentation task, leveraging a rotated Gaussian-mask, aids the network in extracting target regions from cluttered backgrounds and improves detection efficiency with pixel-level predictions.
arXiv Detail & Related papers (2024-11-21T05:10:41Z) - Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - Unmasking Anomalies in Road-Scene Segmentation [18.253109627901566]
Anomaly segmentation is a critical task for driving applications.
We propose a paradigm change by shifting from a per-pixel classification to a mask classification.
Mask2Anomaly demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture.
arXiv Detail & Related papers (2023-07-25T08:23:10Z) - Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition [0.0]
The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy.
It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches.
Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN)
Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
arXiv Detail & Related papers (2023-07-21T07:56:32Z) - Adaptive Shrink-Mask for Text Detection [91.34459257409104]
Existing real-time text detectors reconstruct text contours by shrink-masks directly.
The dependence on predicted shrink-masks leads to unstable detection results.
Super-pixel Window (SPW) is designed to supervise the network.
arXiv Detail & Related papers (2021-11-18T07:38:57Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z) - Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing [61.82466976737915]
Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing.
We propose a new approach to detect presentation attacks from multiple frames based on two insights.
The proposed approach achieves state-of-the-art results on five benchmark datasets.
arXiv Detail & Related papers (2020-03-18T06:11:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.