DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus
Deblurring with Transformer
- URL: http://arxiv.org/abs/2209.06040v1
- Date: Tue, 13 Sep 2022 14:47:09 GMT
- Title: DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus
Deblurring with Transformer
- Authors: Dafeng Zhang and Xiaobing Wang
- Abstract summary: Recent works achieve excellent results in defocus deblurring task based on dual-pixel data using convolutional neural network (CNN)
We propose a dynamic multi-scale network, named DMTNet, for dual-pixel images defocus deblurring.
- Score: 1.408706290287121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works achieve excellent results in defocus deblurring task based on
dual-pixel data using convolutional neural network (CNN), while the scarcity of
data limits the exploration and attempt of vision transformer in this task. In
addition, the existing works use fixed parameters and network architecture to
deblur images with different distribution and content information, which also
affects the generalization ability of the model. In this paper, we propose a
dynamic multi-scale network, named DMTNet, for dual-pixel images defocus
deblurring. DMTNet mainly contains two modules: feature extraction module and
reconstruction module. The feature extraction module is composed of several
vision transformer blocks, which uses its powerful feature extraction
capability to obtain richer features and improve the robustness of the model.
The reconstruction module is composed of several Dynamic Multi-scale
Sub-reconstruction Module (DMSSRM). DMSSRM can restore images by adaptively
assigning weights to features from different scales according to the blur
distribution and content information of the input images. DMTNet combines the
advantages of transformer and CNN, in which the vision transformer improves the
performance ceiling of CNN, and the inductive bias of CNN enables transformer
to extract more robust features without relying on a large amount of data.
DMTNet might be the first attempt to use vision transformer to restore the
blurring images to clarity. By combining with CNN, the vision transformer may
achieve better performance on small datasets. Experimental results on the
popular benchmarks demonstrate that our DMTNet significantly outperforms
state-of-the-art methods.
Related papers
- CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction [14.377544481394013]
CTA-Net combines CNNs and ViTs, with transformers capturing long-range dependencies and CNNs extracting localized features.
This integration enables efficient processing of detailed local and broader contextual information.
Experiments on small-scale datasets with fewer than 100,000 samples show that CTA-Net achieves superior performance.
arXiv Detail & Related papers (2024-10-15T09:27:26Z) - Multiscale Low-Frequency Memory Network for Improved Feature Extraction
in Convolutional Neural Networks [13.815116154370834]
We introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network.
The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks.
Our work builds upon the existing CNN foundations and paves the way for future advancements in computer vision.
arXiv Detail & Related papers (2024-03-13T00:48:41Z) - FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack [11.433602615992516]
We present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder.
By incorporating the LSTM, FocDepthFormer can be pre-trained on large-scale monocular RGB depth estimation datasets.
Our model outperforms state-of-the-art approaches across multiple evaluation metrics.
arXiv Detail & Related papers (2023-10-17T11:53:32Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
Optical Remote Sensing Images [7.764449276074902]
We propose a hybrid network based on multi-scale CNN-transformer structure, termed MCTNet.
We show that our MCTNet achieves better detection performance than existing state-of-the-art CD methods.
arXiv Detail & Related papers (2022-10-14T07:54:28Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.