Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
- URL: http://arxiv.org/abs/2406.10869v2
- Date: Thu, 16 Jan 2025 08:06:16 GMT
- Title: Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
- Authors: Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guoping Qiu,
- Abstract summary: We introduce a novel Geometric Distortion Guided Transformer for Omnidirectional image Super-Resolution (GDGT-OSR)
Specifically, a distortion modulated rectangle-window self-attention mechanism, integrated with deformable self-attention, is proposed to better perceive the distortion.
We show that the new GDGT-OSR outperforms methods in existing literature.
- Score: 26.819508321968108
- License:
- Abstract: As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI super-resolution needs to take into account geometric distortion resulting from ERP. However, without considering such geometric distortion of ERP images, previous deep-learning-based methods only utilize a limited range of pixels and may easily miss self-similar textures for reconstruction. In this paper, we introduce a novel Geometric Distortion Guided Transformer for Omnidirectional image Super-Resolution (GDGT-OSR). Specifically, a distortion modulated rectangle-window self-attention mechanism, integrated with deformable self-attention, is proposed to better perceive the distortion and thus involve more self-similar textures. Distortion modulation is achieved through a newly devised distortion guidance generator that produces guidance by exploiting the variability of distortion across latitudes. Furthermore, we propose a dynamic feature aggregation scheme to adaptively fuse the features from different self-attention modules. We present extensive experimental results on public datasets and show that the new GDGT-OSR outperforms methods in existing literature.
Related papers
- Fast Omni-Directional Image Super-Resolution: Adapting the Implicit Image Function with Pixel and Semantic-Wise Spherical Geometric Priors [38.580815475638595]
This paper proposes a new ODI-SR model characterized by its capacity to perform Fast and Arbitrary-scale ODI-SR processes.
The proposed FAOR outperforms the state-of-the-art ODI-SR models with a much faster inference speed.
arXiv Detail & Related papers (2025-02-09T13:37:50Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image [84.08705684778666]
We propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M.
Specifically, we use a global Transformer to control the holistic shape and a local Transformer to refine the local geometry details.
Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.
arXiv Detail & Related papers (2024-03-20T15:14:22Z) - Deformation-Invariant Neural Network and Its Applications in Distorted
Image Restoration and Analysis [8.009077765403287]
Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition.
Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images.
We propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images.
arXiv Detail & Related papers (2023-10-04T08:01:36Z) - Learning Distortion Invariant Representation for Image Restoration from
A Causality Perspective [42.10777921339209]
We propose a novel training strategy for image restoration from the causality perspective.
Our method, termed Distortion Invariant representation Learning (DIL), treats each distortion type and degree as one specific confounder.
arXiv Detail & Related papers (2023-03-13T05:04:18Z) - OSRT: Omnidirectional Image Super-Resolution with Distortion-aware
Transformer [28.53390467642499]
Previous methods attempt to solve this issue by image super-resolution (SR) on equirectangular projection (ERP) images.
We propose Fisheye downsampling, which mimics the real-world imaging process and synthesizes more realistic low-resolution samples.
We also propose a convenient data augmentation strategy, which synthesizes pseudo ERP images from plain images.
arXiv Detail & Related papers (2023-02-07T13:19:59Z) - RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline
Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet)
Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation.
Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.