Related papers: Spherical Transformer

Spherical Transformer

URL: http://arxiv.org/abs/2202.04942v2
Date: Fri, 11 Feb 2022 07:29:03 GMT
Title: Spherical Transformer
Authors: Sungmin Cho, Raehyuk Jung, Junseok Kwon
Abstract summary: convolutional neural networks for 360images can induce sub-optimal performance due to distortions entailed by a planar projection. We leverage the transformer architecture to solve image classification problems for 360images. Our method does not require the erroneous planar projection process by sampling pixels from the sphere surface.
Score: 17.403133838762447
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Using convolutional neural networks for 360images can induce sub-optimal performance due to distortions entailed by a planar projection. The distortion gets deteriorated when a rotation is applied to the 360image. Thus, many researches based on convolutions attempt to reduce the distortions to learn accurate representation. In contrast, we leverage the transformer architecture to solve image classification problems for 360images. Using the proposed transformer for 360images has two advantages. First, our method does not require the erroneous planar projection process by sampling pixels from the sphere surface. Second, our sampling method based on regular polyhedrons makes low rotation equivariance errors, because specific rotations can be reduced to permutations of faces. In experiments, we validate our network on two aspects, as follows. First, we show that using a transformer with highly uniform sampling methods can help reduce the distortion. Second, we demonstrate that the transformer architecture can achieve rotation equivariance on specific rotations. We compare our method to other state-of-the-art algorithms using the SPH-MNIST, SPH-CIFAR, and SUN360 datasets and show that our method is competitive with other methods.

Related papers

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception [61.7243424157871]
We introduce a transformer-based architecture that, by incorporating a novel Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$degree$ perception benchmarks for depth estimation and semantic segmentation.
arXiv Detail & Related papers (2024-12-09T20:23:10Z)
Distortion-aware Transformer in 360{\deg} Salient Object Detection [44.74647420381127]
We propose a Transformer-based model called DATFormer to address the distortion problem. To exploit the unique characteristics of 360deg data, we present a learnable relation matrix. Our model outperforms existing 2D SOD (salient object detection) and 360 SOD methods.
arXiv Detail & Related papers (2023-08-07T07:28:24Z)
DFR: Depth from Rotation by Uncalibrated Image Rectification with Latitudinal Motion Assumption [6.369764116066747]
We propose Depth-from-Rotation (DfR), a novel image rectification solution for uncalibrated rotating cameras. Specifically, we model the motion of a rotating camera as the camera rotates on a sphere with fixed latitude. We derive a 2-point analytical solver from directly computing the rectified transformations on the two images.
arXiv Detail & Related papers (2023-07-11T09:11:22Z)
Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging [7.601695814245209]
We consider the problem of video snapshot compressive imaging (SCI), where sequential high-speed frames are modulated by different masks and captured by a single measurement. By combining optimization algorithms and neural networks, deep unfolding networks (DUNs) score tremendous achievements in solving inverse problems.
arXiv Detail & Related papers (2023-06-20T06:25:48Z)
Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z)
$PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision. We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z)
Orthonormal Convolutions for the Rotation Based Iterative Gaussianization [64.44661342486434]
This paper elaborates an extension of rotation-based iterative Gaussianization, RBIG, which makes image Gaussianization possible. In images its application has been restricted to small image patches or isolated pixels, because rotation in RBIG is based on principal or independent component analysis. We present the emphConvolutional RBIG: an extension that alleviates this issue by imposing that the rotation in RBIG is a convolution.
arXiv Detail & Related papers (2022-06-08T12:56:34Z)
Rectifying homographies for stereo vision: analytical solution for minimal distortion [0.0]
Rectification is used to simplify the subsequent stereo correspondence problem. This work proposes a closed-form solution for the rectifying homographies that minimise perspective distortion.
arXiv Detail & Related papers (2022-02-28T22:35:47Z)
Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression [42.15877732557837]
We make one of the first attempts to learn deep neural networks for omnidirectional image compression. Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution. Experimental results show that our method consistently achieves better rate-distortion performance than competing methods.
arXiv Detail & Related papers (2021-12-25T12:18:32Z)
Differentiable Rendering with Perturbed Optimizers [85.66675707599782]
Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Our work highlights the link between some well-known differentiable formulations and randomly smoothed renderings. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction.
arXiv Detail & Related papers (2021-10-18T08:56:23Z)
Extreme Rotation Estimation using Dense Correlation Volumes [73.35119461422153]
We present a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting. We observe that, even when images do not overlap, there may be rich hidden cues as to their geometric relationship. We propose a network design that can automatically learn such implicit cues by comparing all pairs of points between the two input images.
arXiv Detail & Related papers (2021-04-28T02:00:04Z)
Robust 360-8PA: Redesigning The Normalized 8-point Algorithm for 360-FoV Images [53.11097060367591]
We present a novel strategy for estimating an essential matrix from 360-FoV images in spherical projection. We show that our normalization can increase the camera pose accuracy by about 20% without significantly overhead the time.
arXiv Detail & Related papers (2021-04-22T07:23:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.