Dual-Hybrid Attention Network for Specular Highlight Removal
- URL: http://arxiv.org/abs/2407.12255v1
- Date: Wed, 17 Jul 2024 01:52:41 GMT
- Title: Dual-Hybrid Attention Network for Specular Highlight Removal
- Authors: Xiaojiao Guo, Xuhang Chen, Shenghong Luo, Shuqiang Wang, Chi-Man Pun,
- Abstract summary: Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos.
Current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability.
We propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms.
- Score: 34.99543751199565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos, ultimately improving the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. Despite significant advances in deep learning-based methods, current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability. In this paper, we propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms to effectively capture and process information across different scales and domains without relying on additional priors or supervision. DHAN-SHR consists of two key components: the Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) and the Adaptive Global Dual Attention Transformer (G-DAT). The L-HD-DAT captures local inter-channel and inter-pixel dependencies while incorporating spectral domain features, enabling the network to effectively model the complex interactions between specular highlights and the underlying surface properties. The G-DAT models global inter-channel relationships and long-distance pixel dependencies, allowing the network to propagate contextual information across the entire image and generate more coherent and consistent highlight-free results. To evaluate the performance of DHAN-SHR and facilitate future research in this area, we compile a large-scale benchmark dataset comprising a diverse range of images with varying levels of specular highlights. Through extensive experiments, we demonstrate that DHAN-SHR outperforms 18 state-of-the-art methods both quantitatively and qualitatively, setting a new standard for specular highlight removal in multimedia applications.
Related papers
- iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency [0.0]
We introduce iiANET (Inception Inspired Attention Network), an efficient hybrid model designed to capture long-range dependencies in complex images.
The fundamental building block, iiABlock, integrates global 2D-MHSA (Multi-Head Self-Attention) with Registers, MBConv2 (MobileNetV2-based convolution), and dilated convolution in parallel.
We serially integrate an ECANET (Efficient Channel Attention Network) at the end of each iiABlock to calibrate channel-wise attention for enhanced model performance.
arXiv Detail & Related papers (2024-07-10T12:39:02Z) - Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and
Multispectral Data Fusion [54.668445421149364]
Deep learning-based hyperspectral image (HSI) super-resolution aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs)
In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimize and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion.
arXiv Detail & Related papers (2023-10-11T07:30:37Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z) - One for All: An End-to-End Compact Solution for Hand Gesture Recognition [8.321276216978637]
This paper proposes an end-to-end compact CNN framework: fine grained feature attentive network for hand gesture recognition (Fit-Hand)
The pipeline of the proposed architecture consists of two main units: FineFeat module and dilated convolutional (Conv) layer.
The effectiveness of Fit-Hand is evaluated by using subject dependent (SD) and subject independent (SI) validation setup over seven benchmark datasets.
arXiv Detail & Related papers (2021-05-15T05:10:47Z) - Video Salient Object Detection via Adaptive Local-Global Refinement [7.723369608197167]
Video salient object detection (VSOD) is an important task in many vision applications.
We propose an adaptive local-global refinement framework for VSOD.
We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
arXiv Detail & Related papers (2021-04-29T14:14:11Z) - Multi-Attention-Network for Semantic Segmentation of Fine Resolution
Remote Sensing Images [10.835342317692884]
The accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks.
This paper proposes a Multi-Attention-Network (MANet) to address these issues.
A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention.
arXiv Detail & Related papers (2020-09-03T09:08:02Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.