Beyond Subspace Isolation: Many-to-Many Transformer for Light Field
Image Super-resolution
- URL: http://arxiv.org/abs/2401.00740v1
- Date: Mon, 1 Jan 2024 12:48:23 GMT
- Title: Beyond Subspace Isolation: Many-to-Many Transformer for Light Field
Image Super-resolution
- Authors: Zeke Zexi Hu, Xiaoming Chen, Vera Yuk Ying Chung, Yiran Shen
- Abstract summary: We introduce a novel Many-to-Many Transformer (M2MT) for light field image super-resolution tasks.
M2MT aggregates angular information in the spatial subspace before performing the self-attention mechanism.
It enables complete access to all information across all sub-aperture images in a light field image.
- Score: 5.277207972856879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effective extraction of spatial-angular features plays a crucial role in
light field image super-resolution (LFSR) tasks, and the introduction of
convolution and Transformers leads to significant improvement in this area.
Nevertheless, due to the large 4D data volume of light field images, many
existing methods opted to decompose the data into a number of lower-dimensional
subspaces and perform Transformers in each sub-space individually. As a side
effect, these methods inadvertently restrict the self-attention mechanisms to a
One-to-One scheme accessing only a limited subset of LF data, explicitly
preventing comprehensive optimization on all spatial and angular cues. In this
paper, we identify this limitation as subspace isolation and introduce a novel
Many-to-Many Transformer (M2MT) to address it. M2MT aggregates angular
information in the spatial subspace before performing the self-attention
mechanism. It enables complete access to all information across all
sub-aperture images (SAIs) in a light field image. Consequently, M2MT is
enabled to comprehensively capture long-range correlation dependencies. With
M2MT as the pivotal component, we develop a simple yet effective M2MT network
for LFSR. Our experimental results demonstrate that M2MT achieves
state-of-the-art performance across various public datasets. We further conduct
in-depth analysis using local attribution maps (LAM) to obtain visual
interpretability, and the results validate that M2MT is empowered with a truly
non-local context in both spatial and angular subspaces to mitigate subspace
isolation and acquire effective spatial-angular representation.
Related papers
- Empowering Snapshot Compressive Imaging: Spatial-Spectral State Space Model with Across-Scanning and Local Enhancement [51.557804095896174]
We introduce a State Space Model with Across-Scanning and Local Enhancement, named ASLE-SSM, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promoting.
Experimental results illustrate ASLE-SSM's superiority over existing state-of-the-art methods, with an inference speed 2.4 times faster than Transformer-based MST and saving 0.12 (M) of parameters.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model [71.50973774576431]
We propose a novel MLLM, INF-LLaVA, designed for effective high-resolution image perception.
We introduce a Dual-perspective Cropping Module (DCM), which ensures that each sub-image contains continuous details from a local perspective.
Second, we introduce Dual-perspective Enhancement Module (DEM) to enable the mutual enhancement of global and local features.
arXiv Detail & Related papers (2024-07-23T06:02:30Z) - Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning [48.99361249764921]
Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution.
However, their quadratic complexity hinders the efficient processing of high resolution 4D inputs.
We propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy.
arXiv Detail & Related papers (2024-06-23T11:28:08Z) - AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera
Joint Synthesis [98.3959800235485]
Recently, there exist some methods exploring multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance.
In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors.
We introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI)
arXiv Detail & Related papers (2024-02-27T13:08:47Z) - SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote
Sensing Image Classification [35.52272615695294]
We propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification.
Our SS-MAE fully exploits the spatial and spectral representations of the input data.
To complement local features in the training stage, we add two lightweight CNNs for feature extraction.
arXiv Detail & Related papers (2023-11-08T03:54:44Z) - Dual Aggregation Transformer for Image Super-Resolution [92.41781921611646]
We propose a novel Transformer model, Dual Aggregation Transformer, for image SR.
Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner.
Our experiments show that our DAT surpasses current methods.
arXiv Detail & Related papers (2023-08-07T07:39:39Z) - OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation [48.828453331724965]
We propose an Omni-Aperture Fusion model (OAFuser) to extract angular information from sub-aperture images to generate semantically consistent results.
The proposed OAFuser achieves state-of-the-art performance on four UrbanLF datasets in terms of all evaluation metrics.
arXiv Detail & Related papers (2023-07-28T14:43:27Z) - Learning Non-Local Spatial-Angular Correlation for Light Field Image
Super-Resolution [36.69391399634076]
Exploiting spatial-angular correlation is crucial to light field (LF) image super-resolution (SR)
We propose a simple yet effective method to learn the non-local spatial-angular correlation for LF image SR.
Our method can fully incorporate the information from all angular views while achieving a global receptive field along the epipolar line.
arXiv Detail & Related papers (2023-02-16T03:40:40Z) - Stereo Superpixel Segmentation Via Decoupled Dynamic Spatial-Embedding
Fusion Network [17.05076034398913]
We propose a stereo superpixel segmentation method with a decoupling mechanism of spatial information in this work.
To decouple stereo disparity information and spatial information, the spatial information is temporarily removed before fusing the features of stereo image pairs.
Our method can achieve the state-of-the-art performance on the KITTI2015 and Cityscapes datasets, and also verify the efficiency when applied in salient object detection on NJU2K dataset.
arXiv Detail & Related papers (2022-08-17T08:22:50Z) - Efficient Light Field Reconstruction via Spatio-Angular Dense Network [14.568586050271357]
We propose an end-to-end Spatio-Angular Dense Network (SADenseNet) for light field reconstruction.
We show that the proposed SADenseNet's state-of-the-art performance at significantly reduced costs in memory and computation.
Results show that the reconstructed light field images are sharp with correct details and can serve as pre-processing to improve the accuracy of measurement related applications.
arXiv Detail & Related papers (2021-08-08T13:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.