TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising
- URL: http://arxiv.org/abs/2404.07846v1
- Date: Thu, 11 Apr 2024 15:39:10 GMT
- Title: TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising
- Authors: Junyi Li, Zhilu Zhang, Wangmeng Zuo,
- Abstract summary: Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID)
We present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement.
For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution.
For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures.
- Score: 94.09442506816724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID). Existing BSNs are mostly conducted with convolution layers. Although transformers offer potential solutions to the limitations of convolutions and have demonstrated success in various image restoration tasks, their attention mechanisms may violate the blind-spot requirement, thus restricting their applicability in SSID. In this paper, we present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement. Specifically, TBSN follows the architectural principles of dilated BSNs, and incorporates spatial as well as channel self-attention layers to enhance the network capability. For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution. For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures. To eliminate this effect, we divide the channel into several groups and perform channel attention separately. Furthermore, we introduce a knowledge distillation strategy that distills TBSN into smaller denoisers to improve computational efficiency while maintaining performance. Extensive experiments on real-world image denoising datasets show that TBSN largely extends the receptive field and exhibits favorable performance against state-of-the-art SSID methods. The code and pre-trained models will be publicly available at https://github.com/nagejacob/TBSN.
Related papers
- Hyperspectral Image Denoising via Self-Modulating Convolutional Neural
Networks [15.700048595212051]
We introduce a self-modulating convolutional neural network which utilizes correlated spectral and spatial information.
At the core of the model lies a novel block, which allows the network to transform the features in an adaptive manner based on the adjacent spectral data.
Experimental analysis on both synthetic and real data shows that the proposed SM-CNN outperforms other state-of-the-art HSI denoising methods.
arXiv Detail & Related papers (2023-09-15T06:57:43Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Attention Aided CSI Wireless Localization [19.50869817974852]
We propose attention-based CSI for robust feature learning in deep neural networks (DNNs)
We evaluate the performance of attended features in centralized and distributed massive MIMO systems for ray-tracing channels in two non-stationary railway track environments.
arXiv Detail & Related papers (2022-03-20T09:38:01Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z) - Decentralized Learning for Channel Allocation in IoT Networks over
Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game [134.88020946767404]
We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network.
Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error.
arXiv Detail & Related papers (2020-03-30T10:05:35Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z) - Channel Equilibrium Networks for Learning Deep Representation [63.76618960820138]
This work shows that the combination of normalization and rectified linear function leads to inhibited channels.
Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block.
Channel Equilibrium (CE) block enables channels at the same layer to contribute equally to the learned representation.
arXiv Detail & Related papers (2020-02-29T09:02:31Z) - Channel-Attention Dense U-Net for Multichannel Speech Enhancement [21.94418736688929]
We introduce a channel-attention mechanism inside the deep architecture to mimic beamforming.
We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.
arXiv Detail & Related papers (2020-01-30T19:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.