Related papers: Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution

Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution

URL: http://arxiv.org/abs/2401.00740v1
Date: Mon, 1 Jan 2024 12:48:23 GMT
Title: Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution
Authors: Zeke Zexi Hu, Xiaoming Chen, Vera Yuk Ying Chung, Yiran Shen
Abstract summary: We introduce a novel Many-to-Many Transformer (M2MT) for light field image super-resolution tasks. M2MT aggregates angular information in the spatial subspace before performing the self-attention mechanism. It enables complete access to all information across all sub-aperture images in a light field image.
Score: 5.277207972856879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The effective extraction of spatial-angular features plays a crucial role in light field image super-resolution (LFSR) tasks, and the introduction of convolution and Transformers leads to significant improvement in this area. Nevertheless, due to the large 4D data volume of light field images, many existing methods opted to decompose the data into a number of lower-dimensional subspaces and perform Transformers in each sub-space individually. As a side effect, these methods inadvertently restrict the self-attention mechanisms to a One-to-One scheme accessing only a limited subset of LF data, explicitly preventing comprehensive optimization on all spatial and angular cues. In this paper, we identify this limitation as subspace isolation and introduce a novel Many-to-Many Transformer (M2MT) to address it. M2MT aggregates angular information in the spatial subspace before performing the self-attention mechanism. It enables complete access to all information across all sub-aperture images (SAIs) in a light field image. Consequently, M2MT is enabled to comprehensively capture long-range correlation dependencies. With M2MT as the pivotal component, we develop a simple yet effective M2MT network for LFSR. Our experimental results demonstrate that M2MT achieves state-of-the-art performance across various public datasets. We further conduct in-depth analysis using local attribution maps (LAM) to obtain visual interpretability, and the results validate that M2MT is empowered with a truly non-local context in both spatial and angular subspaces to mitigate subspace isolation and acquire effective spatial-angular representation.

Related papers

$L^2$FMamba: Lightweight Light Field Image Super-Resolution with State Space Model [3.741194134589865]
Transformers bring significantly improved performance to the light field image super-resolution task due to their long-range dependency modeling capability. We introduce the LF-VSSM block, a novel module inspired by progressive feature extraction, to efficiently capture critical long-range spatial-angular dependencies in light field images. We propose a lightweight network, $L2$FMamba, which integrates the LF-VSSM block to leverage light field features for super-resolution tasks while overcoming the computational challenges of Transformer-based approaches.
arXiv Detail & Related papers (2025-03-25T01:24:52Z)
Empowering Snapshot Compressive Imaging: Spatial-Spectral State Space Model with Across-Scanning and Local Enhancement [51.557804095896174]
We introduce a State Space Model with Across-Scanning and Local Enhancement, named ASLE-SSM, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promoting. Experimental results illustrate ASLE-SSM's superiority over existing state-of-the-art methods, with an inference speed 2.4 times faster than Transformer-based MST and saving 0.12 (M) of parameters.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model [71.50973774576431]
We propose a novel MLLM, INF-LLaVA, designed for effective high-resolution image perception. We introduce a Dual-perspective Cropping Module (DCM), which ensures that each sub-image contains continuous details from a local perspective. Second, we introduce Dual-perspective Enhancement Module (DEM) to enable the mutual enhancement of global and local features.
arXiv Detail & Related papers (2024-07-23T06:02:30Z)
Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning [48.99361249764921]
Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution. However, their quadratic complexity hinders the efficient processing of high resolution 4D inputs. We propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy.
arXiv Detail & Related papers (2024-06-23T11:28:08Z)
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis [98.3959800235485]
Recently, there exist some methods exploring multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance. In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors. We introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI)
arXiv Detail & Related papers (2024-02-27T13:08:47Z)
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification [35.52272615695294]
We propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification. Our SS-MAE fully exploits the spatial and spectral representations of the input data. To complement local features in the training stage, we add two lightweight CNNs for feature extraction.
arXiv Detail & Related papers (2023-11-08T03:54:44Z)
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing. The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal. The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z)
Dual Aggregation Transformer for Image Super-Resolution [92.41781921611646]
We propose a novel Transformer model, Dual Aggregation Transformer, for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Our experiments show that our DAT surpasses current methods.
arXiv Detail & Related papers (2023-08-07T07:39:39Z)
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation [48.828453331724965]
We propose an Omni-Aperture Fusion model (OAFuser) to extract angular information from sub-aperture images to generate semantically consistent results. The proposed OAFuser achieves state-of-the-art performance on four UrbanLF datasets in terms of all evaluation metrics.
arXiv Detail & Related papers (2023-07-28T14:43:27Z)
Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution [36.69391399634076]
Exploiting spatial-angular correlation is crucial to light field (LF) image super-resolution (SR) We propose a simple yet effective method to learn the non-local spatial-angular correlation for LF image SR. Our method can fully incorporate the information from all angular views while achieving a global receptive field along the epipolar line.
arXiv Detail & Related papers (2023-02-16T03:40:40Z)
Stereo Superpixel Segmentation Via Decoupled Dynamic Spatial-Embedding Fusion Network [17.05076034398913]
We propose a stereo superpixel segmentation method with a decoupling mechanism of spatial information in this work. To decouple stereo disparity information and spatial information, the spatial information is temporarily removed before fusing the features of stereo image pairs. Our method can achieve the state-of-the-art performance on the KITTI2015 and Cityscapes datasets, and also verify the efficiency when applied in salient object detection on NJU2K dataset.
arXiv Detail & Related papers (2022-08-17T08:22:50Z)
Efficient Light Field Reconstruction via Spatio-Angular Dense Network [14.568586050271357]
We propose an end-to-end Spatio-Angular Dense Network (SADenseNet) for light field reconstruction. We show that the proposed SADenseNet's state-of-the-art performance at significantly reduced costs in memory and computation. Results show that the reconstructed light field images are sharp with correct details and can serve as pre-processing to improve the accuracy of measurement related applications.
arXiv Detail & Related papers (2021-08-08T13:50:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.