M2-Net: Multi-stages Specular Highlight Detection and Removal in
Multi-scenes
- URL: http://arxiv.org/abs/2207.09965v1
- Date: Wed, 20 Jul 2022 15:18:43 GMT
- Title: M2-Net: Multi-stages Specular Highlight Detection and Removal in
Multi-scenes
- Authors: Zhaoyangfan Huang and Kun Hu and Xingjun Wang
- Abstract summary: The framework consists of three main components, highlight feature extractor module, highlight coarse removal module, and highlight refine removal module.
Our algorithm is applied for the first time in video highlight removal with promising results.
- Score: 3.312427167335527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel uniformity framework for highlight
detection and removal in multi-scenes, including synthetic images, face images,
natural images, and text images. The framework consists of three main
components, highlight feature extractor module, highlight coarse removal
module, and highlight refine removal module. Firstly, the highlight feature
extractor module can directly separate the highlight feature and non-highlight
feature from the original highlight image. Then highlight removal image is
obtained using a coarse highlight removal network. To further improve the
highlight removal effect, the refined highlight removal image is finally
obtained using refine highlight removal module based on contextual highlight
attention mechanisms. Extensive experimental results in multiple scenes
indicate that the proposed framework can obtain excellent visual effects of
highlight removal and achieve state-of-the-art results in several quantitative
evaluation metrics. Our algorithm is applied for the first time in video
highlight removal with promising results.
Related papers
- Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework [24.97672212363703]
We propose ExtAbs, which jointly and seamlessly performs Extractive and Abstractive summarization tasks within single encoder-decoder model.
In ExtAbs, the vanilla encoder is augmented to extract salients, and the vanilla decoder is modified with the proposed saliency mask to generate summaries.
Experiments show that ExtAbs can achieve superior performance than baselines on the extractive task and performs comparable, or even better than the vanilla models on the abstractive task.
arXiv Detail & Related papers (2024-09-18T09:21:25Z) - VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z) - Towards High-Quality Specular Highlight Removal by Leveraging
Large-Scale Synthetic Data [45.30068102110486]
This paper aims to remove specular highlights from a single object-level image.
We propose a three-stage network to address them.
We present a large-scale synthetic dataset of object-level images.
arXiv Detail & Related papers (2023-09-12T15:10:23Z) - SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object
Segmentation Network [70.89436857471887]
We propose an interactive distractor selection method that is optimized to achieve the task with just a single click.
Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation.
Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups.
arXiv Detail & Related papers (2023-05-28T04:05:24Z) - Good Visual Guidance Makes A Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction [88.6585431949086]
We propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction.
We regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision.
Experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-05-07T02:10:55Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Text-Aware Single Image Specular Highlight Removal [14.624958411229862]
Existing methods typically remove specular highlight for medical images and specific-object images, however, they cannot handle the images with text.
In this paper, we first raise and study the text-aware single image specular highlight removal problem.
The core goal is to improve the accuracy of text detection and recognition by removing the highlight from text images.
arXiv Detail & Related papers (2021-08-16T03:51:53Z) - Disassembling Object Representations without Labels [75.2215716328001]
We study a new representation-learning task, which we termed as disassembling object representations.
Disassembling enables category-specific modularity in the learned representations.
We propose an unsupervised approach to achieving disassembling, named Unsupervised Disassembling Object Representation (UDOR)
arXiv Detail & Related papers (2020-04-03T08:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.