Portmanteauing Features for Scene Text Recognition
- URL: http://arxiv.org/abs/2211.05036v1
- Date: Wed, 9 Nov 2022 17:14:14 GMT
- Title: Portmanteauing Features for Scene Text Recognition
- Authors: Yew Lee Tan, Ernest Yu Kai Chew, Adams Wai-Kin Kong, Jung-Jae Kim, Joo
Hwee Lim
- Abstract summary: State-of-the-art methods rely on a rectification network, which is connected to the text recognition network.
A portmanteau feature, inspired by the portmanteau word, is a feature containing information from both the original text image and the rectified image.
The proposed method is examined on 6 benchmarks and compared with 13 state-of-the-art methods.
- Score: 15.961450585164144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text images have different shapes and are subjected to various
distortions, e.g. perspective distortions. To handle these challenges, the
state-of-the-art methods rely on a rectification network, which is connected to
the text recognition network. They form a linear pipeline which uses text
rectification on all input images, even for images that can be recognized
without it. Undoubtedly, the rectification network improves the overall text
recognition performance. However, in some cases, the rectification network
generates unnecessary distortions on images, resulting in incorrect predictions
in images that would have otherwise been correct without it. In order to
alleviate the unnecessary distortions, the portmanteauing of features is
proposed. The portmanteau feature, inspired by the portmanteau word, is a
feature containing information from both the original text image and the
rectified image. To generate the portmanteau feature, a non-linear input
pipeline with a block matrix initialization is presented. In this work, the
transformer is chosen as the recognition network due to its utilization of
attention and inherent parallelism, which can effectively handle the
portmanteau feature. The proposed method is examined on 6 benchmarks and
compared with 13 state-of-the-art methods. The experimental results show that
the proposed method outperforms the state-of-the-art methods on various of the
benchmarks.
Related papers
- Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images.
Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features.
We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z) - Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion
Models [68.47333676663312]
We show a simple modification of classifier-free guidance can help disentangle image factors in text-to-image models.
The key idea of our method, Contrastive Guidance, is to characterize an intended factor with two prompts that differ in minimal tokens.
We illustrate whose benefits in three scenarios: (1) to guide domain-specific diffusion models trained on an object class, (2) to gain continuous, rig-like controls for text-to-image generation, and (3) to improve the performance of zero-shot image editors.
arXiv Detail & Related papers (2024-02-21T03:01:17Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Saliency Constrained Arbitrary Image Style Transfer using SIFT and DCNN [22.57205921266602]
When common neural style transfer methods are used, the textures and colors in the style image are usually transferred imperfectly to the content image.
This paper proposes a novel saliency constrained method to reduce or avoid such effects.
The experiments show that the saliency maps of source images can help find the correct matching and avoid artifacts.
arXiv Detail & Related papers (2022-01-14T09:00:55Z) - Image Inpainting with Edge-guided Learnable Bidirectional Attention Maps [85.67745220834718]
We present an edge-guided learnable bidirectional attention map (Edge-LBAM) for improving image inpainting of irregular holes.
Our Edge-LBAM method contains dual procedures,including structure-aware mask-updating guided by predict edges.
Extensive experiments show that our Edge-LBAM is effective in generating coherent image structures and preventing color discrepancy and blurriness.
arXiv Detail & Related papers (2021-04-25T07:25:16Z) - Generative and Discriminative Learning for Distorted Image Restoration [22.230017059874445]
Liquify is a technique for image editing, which can be used for image distortion.
We propose a novel generative and discriminative learning method based on deep neural networks.
arXiv Detail & Related papers (2020-11-11T14:01:29Z) - Scene Text Recognition via Transformer [36.55457990615167]
Scene text recognition with arbitrary shape is very challenging due to large variations in text shapes, fonts, colors, backgrounds, etc.
Most state-of-the-art algorithms rectify the input image into the normalized image, then treat the recognition as a sequence prediction task.
We propose a simple but extremely effective scene text recognition method based on transformer [50].
arXiv Detail & Related papers (2020-03-18T07:38:02Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z) - Learning Transformation-Aware Embeddings for Image Forensics [15.484408315588569]
Image Provenance Analysis aims at discovering relationships among different manipulated image versions that share content.
One of the main sub-problems for provenance analysis that has not yet been addressed directly is the edit ordering of images that share full content or are near-duplicates.
This paper introduces a novel deep learning-based approach to provide a plausible ordering to images that have been generated from a single image through transformations.
arXiv Detail & Related papers (2020-01-13T22:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.