Text-Aware Single Image Specular Highlight Removal
- URL: http://arxiv.org/abs/2108.06881v1
- Date: Mon, 16 Aug 2021 03:51:53 GMT
- Title: Text-Aware Single Image Specular Highlight Removal
- Authors: Shiyu Hou, Chaoqun Wang, Weize Quan, Jingen Jiang, Dong-Ming Yan
- Abstract summary: Existing methods typically remove specular highlight for medical images and specific-object images, however, they cannot handle the images with text.
In this paper, we first raise and study the text-aware single image specular highlight removal problem.
The core goal is to improve the accuracy of text detection and recognition by removing the highlight from text images.
- Score: 14.624958411229862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Removing undesirable specular highlight from a single input image is of
crucial importance to many computer vision and graphics tasks. Existing methods
typically remove specular highlight for medical images and specific-object
images, however, they cannot handle the images with text. In addition, the
impact of specular highlight on text recognition is rarely studied by text
detection and recognition community. Therefore, in this paper, we first raise
and study the text-aware single image specular highlight removal problem. The
core goal is to improve the accuracy of text detection and recognition by
removing the highlight from text images. To tackle this challenging problem, we
first collect three high-quality datasets with fine-grained annotations, which
will be appropriately released to facilitate the relevant research. Then, we
design a novel two-stage network, which contains a highlight detection network
and a highlight removal network. The output of highlight detection network
provides additional information about highlight regions to guide the subsequent
highlight removal network. Moreover, we suggest a measurement set including the
end-to-end text detection and recognition evaluation and auxiliary visual
quality evaluation. Extensive experiments on our collected datasets demonstrate
the superior performance of the proposed method.
Related papers
- See then Tell: Enhancing Key Information Extraction with Vision Grounding [54.061203106565706]
We introduce STNet (See then Tell Net), a novel end-to-end model designed to deliver precise answers with relevant vision grounding.
To enhance the model's seeing capabilities, we collect extensive structured table recognition datasets.
arXiv Detail & Related papers (2024-09-29T06:21:05Z) - Task-driven single-image super-resolution reconstruction of document scans [2.8391355909797644]
We investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans.
To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection.
arXiv Detail & Related papers (2024-07-12T05:18:26Z) - Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing [49.419619882284906]
Ground-A-Score is a powerful model-agnostic image editing method by incorporating grounding during score distillation.
The selective application with a new penalty coefficient and contrastive loss helps to precisely target editing areas.
Both qualitative assessments and quantitative analyses confirm that Ground-A-Score successfully adheres to the intricate details of extended and multifaceted prompts.
arXiv Detail & Related papers (2024-03-20T12:40:32Z) - Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [11.798006331912056]
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions.
We propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts.
arXiv Detail & Related papers (2023-07-18T08:23:46Z) - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
Recognition [63.6608759501803]
We propose to recognize artistic text at three levels.
corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape.
Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z) - M2-Net: Multi-stages Specular Highlight Detection and Removal in
Multi-scenes [3.312427167335527]
The framework consists of three main components, highlight feature extractor module, highlight coarse removal module, and highlight refine removal module.
Our algorithm is applied for the first time in video highlight removal with promising results.
arXiv Detail & Related papers (2022-07-20T15:18:43Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z) - Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image
Classification and Retrieval [8.317191999275536]
This paper focuses on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval.
We employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image.
arXiv Detail & Related papers (2020-09-21T12:31:42Z) - Text Detection and Recognition in the Wild: A Review [7.43788469020627]
State-of-the-art scene text detection and/or recognition methods have exploited the advancement in deep learning architectures.
The paper presents a review on the recent advancement in scene text detection and recognition.
It also identifies several existing challenges for detecting or recognizing text in the wild images.
arXiv Detail & Related papers (2020-06-08T01:08:04Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.