Related papers: Efficient Multi-disparity Transformer for Light Field Image Super-resolution

Efficient Multi-disparity Transformer for Light Field Image Super-resolution

URL: http://arxiv.org/abs/2407.15329v1
Date: Mon, 22 Jul 2024 02:23:09 GMT
Title: Efficient Multi-disparity Transformer for Light Field Image Super-resolution
Authors: Zeke Zexi Hu, Haodong Chen, Yuk Ying Chung, Xiaoming Chen,
Abstract summary: This paper presents the Multi-scale Disparity Transformer (MDT), a novel Transformer tailored for light field image super-resolution (LFSR) MDT addresses the issues of computational redundancy and disparity entanglement caused by the indiscriminate processing of sub-aperture images. Building on this architecture, we present LF-MDTNet, an efficient LFSR network.
Score: 6.814658355110824
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper presents the Multi-scale Disparity Transformer (MDT), a novel Transformer tailored for light field image super-resolution (LFSR) that addresses the issues of computational redundancy and disparity entanglement caused by the indiscriminate processing of sub-aperture images inherent in conventional methods. MDT features a multi-branch structure, with each branch utilising independent disparity self-attention (DSA) to target specific disparity ranges, effectively reducing computational complexity and disentangling disparities. Building on this architecture, we present LF-MDTNet, an efficient LFSR network. Experimental results demonstrate that LF-MDTNet outperforms existing state-of-the-art methods by 0.37 dB and 0.41 dB PSNR at the 2x and 4x scales, achieving superior performance with fewer parameters and higher speed.

Related papers

Compressive Imaging Reconstruction via Tensor Decomposed Multi-Resolution Grid Encoding [50.54887630778593]
Compressive imaging (CI) reconstruction aims to recover high-dimensional images from low-dimensional measurements compressed.<n>Existing unsupervised representations may struggle to achieve a desired balance between representation ability and efficiency.<n>We propose Decomposed multi-resolution Grid encoding (GridTD), an unsupervised continuous representation framework for CI reconstruction.
arXiv Detail & Related papers (2025-07-10T12:36:20Z)
QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution [54.67891514843853]
We propose the Quadtree Diffusion Model (QDM), a region-adaptive diffusion framework. By guiding the diffusion with a quadtree derived from the low-quality input, QDM identifies key regions-represented by leaf nodes-where fine detail is essential. Experiments demonstrate QDM's effectiveness in high-resolution SR tasks across diverse image types, particularly in medical imaging.
arXiv Detail & Related papers (2025-03-15T06:50:30Z)
Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR) In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks. We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z)
Dual-Domain Deep D-bar Method for Solving Electrical Impedance Tomography [5.112764609048122]
The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems. D-bar images often present low contrast and low resolution due to the absence of accurate high-frequency information. We propose a dual-domain neural network architecture to retrieve high-contrast D-bar image sequences from low-contrast D-bar images.
arXiv Detail & Related papers (2024-05-12T21:55:02Z)
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring [25.36888929483233]
We propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring. We combine the characteristics of real-world trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images.
arXiv Detail & Related papers (2023-12-29T02:59:40Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement [75.25451566988565]
We propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images. Detailed experimental analysis on five datasets validates our approach and sets a state-of-the-art for burst super-resolution, burst denoising, and low-light burst enhancement.
arXiv Detail & Related papers (2023-04-13T17:54:00Z)
DDT: Dual-branch Deformable Transformer for Image Denoising [6.596462333804802]
Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome limitations presented by inductive convolutional biases. We propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel.
arXiv Detail & Related papers (2023-04-13T08:54:44Z)
DCS-RISR: Dynamic Channel Splitting for Efficient Real-world Image Super-Resolution [15.694407977871341]
Real-world image super-resolution (RISR) has received increased focus for improving the quality of SR images under unknown complex degradation. Existing methods rely on the heavy SR models to enhance low-resolution (LR) images of different degradation levels. We propose a novel Dynamic Channel Splitting scheme for efficient Real-world Image Super-Resolution, termed DCS-RISR.
arXiv Detail & Related papers (2022-12-15T04:34:57Z)
FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring [72.43250555622254]
We propose a lightweight and real-time unsupervised BID baseline, termed Frequency-domain Contrastive Loss Constrained Lightweight CycleGAN. FCL-GAN has attractive properties, i.e., no image domain limitation, no image resolution limitation, 25x lighter than SOTA, and 5x faster than SOTA. Experiments on several image datasets demonstrate the effectiveness of FCL-GAN in terms of performance, model size and reference time.
arXiv Detail & Related papers (2022-04-16T15:08:03Z)
Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution [28.00231586840797]
Real-world image super-resolution (Real-ISR) is a challenging task due to the unknown complex degradation of real-world images. Recent research on Real-ISR has achieved significant progress by modeling the image degradation space. We propose an efficient degradation-adaptive super-resolution (DASR) network, whose parameters are adaptively specified by estimating the degradation of each input image.
arXiv Detail & Related papers (2022-03-27T05:59:13Z)
Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks. We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage. We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction. Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z)
Boosting Image Super-Resolution Via Fusion of Complementary Information Captured by Multi-Modal Sensors [21.264746234523678]
Image Super-Resolution (SR) provides a promising technique to enhance the image quality of low-resolution optical sensors. In this paper, we attempt to leverage complementary information from a low-cost channel (visible/depth) to boost image quality of an expensive channel (thermal) using fewer parameters.
arXiv Detail & Related papers (2020-12-07T02:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.