Related papers: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal

Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal

URL: http://arxiv.org/abs/2512.04496v1
Date: Thu, 04 Dec 2025 06:02:37 GMT
Title: Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal
Authors: Tianci Huo, Lingfeng Qi, Yuhan Chen, Qihong Xue, Jinyuan Shao, Hai Yu, Jie Li, Zhanhua Zhang, Guofa Li,
Abstract summary: We propose a multi-model architecture for specular highlight removal (MM-SHR)<n>We employ convolution operations to extract local details in the shallow layers of MM-SHR, and utilize the attention mechanism to capture global features in the deep layers.<n> MM-SHR outperforms state-of-the-art methods in both accuracy and efficiency for specular highlight removal.
Score: 14.771301170089174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inevitable specular highlights in practical environments severely impair the visual performance, thus degrading the task effectiveness and efficiency. Although there exist considerable methods that focus on local information from convolutional neural network models or global information from transformer models, the single-type model falls into a modeling dilemma between local fine-grained details and global long-range dependencies, thus deteriorating for specular highlights with different scales. Therefore, to accommodate specular highlights of all scales, we propose a multi-model architecture for specular highlight removal (MM-SHR) that effectively captures fine-grained features in highlight regions and models long-range dependencies between highlight and highlight-free areas. Specifically, we employ convolution operations to extract local details in the shallow layers of MM-SHR, and utilize the attention mechanism to capture global features in the deep layers, ensuring both operation efficiency and removal accuracy. To model long-range dependencies without compromising computational complexity, we utilize a coarse-to-fine manner and propose Omni-Directional Attention Integration Block(OAIBlock) and Adaptive Region-Aware Hybrid-Domain Dual Attention Convolutional Network(HDDAConv) , which leverage omni-directiona pixel-shifting and window-dividing operations at the raw features to achieve specular highlight removal. Extensive experimental results on three benchmark tasks and six types of surface materials demonstrate that MM-SHR outperforms state-of-the-art methods in both accuracy and efficiency for specular highlight removal. The implementation will be made publicly available at https://github.com/Htcicv/MM-SHR.

Related papers

MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy [15.270952880303533]
We propose a novel 3D multi-modal image segmentation framework, MSD-KMamba.<n>It integrates bidirectional spatial perception with multi-scale self-distillation.<n>Our framework consistently outperforms state-of-the-art methods in segmentation accuracy, robustness, and generalization.
arXiv Detail & Related papers (2025-09-28T06:34:01Z)
SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection [12.964308630328688]
Infrared small target detection (ISTD) is vital for long-range surveillance in military, maritime, and early warning applications.<n>ISTD is challenged by targets occupying less than 0.15% of the image and low distinguishability from complex backgrounds.<n>This paper presents SAMamba, a novel framework integrating SAM2's hierarchical feature learning with Mamba's selective sequence modeling.
arXiv Detail & Related papers (2025-05-29T07:55:23Z)
VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant.<n>Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations.<n>To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning [7.06787067270941]
We propose a novel framework that significantly reduces data volume while enhancing classification accuracy.<n>Our model employs a bidirectional reversed convolutional neural network (CNN) to efficiently extract spectral features, complemented by a specialized block for spatial feature analysis.
arXiv Detail & Related papers (2024-11-29T23:32:26Z)
MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution [14.265237560766268]
We introduce Multi-Range Attention Transformer (MAT) for image super-resolution (SR) tasks.<n>MAT facilitates both multi-range attention (MA) and sparse multi-range attention (SMA), enabling efficient capture of both regional and sparse global features.<n>We also introduce the MSConvStar module, which augments the model's ability for multi-range representation learning.
arXiv Detail & Related papers (2024-11-26T08:30:31Z)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Dual-Hybrid Attention Network for Specular Highlight Removal [34.99543751199565]
Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos. Current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability. We propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms.
arXiv Detail & Related papers (2024-07-17T01:52:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.