SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily
Oriented Scene Text Recognition
- URL: http://arxiv.org/abs/2207.10256v1
- Date: Thu, 21 Jul 2022 01:41:53 GMT
- Title: SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily
Oriented Scene Text Recognition
- Authors: Dajian Zhong and Shujing Lyu and Palaiahnakote Shivakumara and Bing
Yin and Jiajia Wu and Umapada Pal and Yue Lu
- Abstract summary: We propose a novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images.
The proposed method first generates the simple semantic feature using Semantic GAN and then recognizes the scene text with the Balanced Attention Module.
- Score: 26.571128345615108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text recognition is a challenging task due to the complex backgrounds
and diverse variations of text instances. In this paper, we propose a novel
Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in
scene images. The proposed method first generates the simple semantic feature
using Semantic GAN and then recognizes the scene text with the Balanced
Attention Module. The Semantic GAN aims to align the semantic feature
distribution between the support domain and target domain. Different from the
conventional image-to-image translation methods that perform at the image
level, the Semantic GAN performs the generation and discrimination on the
semantic level with the Semantic Generator Module (SGM) and Semantic
Discriminator Module (SDM). For target images (scene text images), the Semantic
Generator Module generates simple semantic features that share the same feature
distribution with support images (clear text images). The Semantic
Discriminator Module is used to distinguish the semantic features between the
support domain and target domain. In addition, a Balanced Attention Module is
designed to alleviate the problem of attention drift. The Balanced Attention
Module first learns a balancing parameter based on the visual glimpse vector
and semantic glimpse vector, and then performs the balancing operation for
obtaining a balanced glimpse vector. Experiments on six benchmarks, including
regular datasets, i.e., IIIT5K, SVT, ICDAR2013, and irregular datasets, i.e.,
ICDAR2015, SVTP, CUTE80, validate the effectiveness of our proposed method.
Related papers
- Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension [40.21084218601082]
This paper focuses on a challenging setup where target localization is learned directly from image-text pairs.
We propose a novel Progressive Network (PCNet) to leverage target-related textual cues for progressively localizing the target object.
Our method outperforms SOTA methods on three common benchmarks.
arXiv Detail & Related papers (2024-10-02T13:30:32Z) - Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation [27.95875467352853]
We propose a new referring remote sensing image segmentation method, FIANet, that fully exploits the visual and linguistic representations.
The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts.
We evaluate the effectiveness of the proposed methods on two public referring remote sensing datasets including RefSegRS and RRSIS-D.
arXiv Detail & Related papers (2024-09-20T16:45:32Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Adaptive Prompt Learning with Negative Textual Semantics and Uncertainty Modeling for Universal Multi-Source Domain Adaptation [15.773845409601389]
Universal Multi-source Domain Adaptation (UniMDA) transfers knowledge from multiple labeled source domains to an unlabeled target domain.
Existing solutions focus on excavating image features to detect unknown samples, ignoring abundant information contained in textual semantics.
We propose an Adaptive Prompt learning with Negative textual semantics and uncErtainty modeling method for UniMDA classification tasks.
arXiv Detail & Related papers (2024-04-23T02:54:12Z) - RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff.
In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt.
The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z) - Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning [74.48337375174297]
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain.
We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features.
DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
arXiv Detail & Related papers (2023-03-27T15:21:43Z) - Unsupervised Domain Adaptation for Semantic Segmentation using One-shot
Image-to-Image Translation via Latent Representation Mixing [9.118706387430883]
We propose a new unsupervised domain adaptation method for the semantic segmentation of very high resolution images.
An image-to-image translation paradigm is proposed, based on an encoder-decoder principle where latent content representations are mixed across domains.
Cross-city comparative experiments have shown that the proposed method outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2022-12-07T18:16:17Z) - Target-oriented Sentiment Classification with Sequential Cross-modal
Semantic Graph [27.77392307623526]
Multi-modal aspect-based sentiment classification (MABSC) is task of classifying the sentiment of a target entity mentioned in a sentence and an image.
Previous methods failed to account for the fine-grained semantic association between the image and the text.
We propose a new approach called SeqCSG, which enhances the encoder-decoder sentiment classification framework using sequential cross-modal semantic graphs.
arXiv Detail & Related papers (2022-08-19T16:04:29Z) - Weakly-supervised segmentation of referring expressions [81.73850439141374]
Text grounded semantic SEGmentation learns segmentation masks directly from image-level referring expressions without pixel-level annotations.
Our approach demonstrates promising results for weakly-supervised referring expression segmentation on the PhraseCut and RefCOCO datasets.
arXiv Detail & Related papers (2022-05-10T07:52:24Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.