Equirectangular image construction method for standard CNNs for Semantic
Segmentation
- URL: http://arxiv.org/abs/2310.09122v1
- Date: Fri, 13 Oct 2023 14:11:33 GMT
- Title: Equirectangular image construction method for standard CNNs for Semantic
Segmentation
- Authors: Haoqian Chen, Jian Liu, Minghe Li, Kaiwen Jiang, Ziheng Xu, Rencheng
Sun and Yi Sui
- Abstract summary: We propose a methodology for converting a perspective image into equirectangular image.
The inverse transformation of the spherical center projection and the equidistant cylindrical projection are employed.
Experiments demonstrate that an optimal value of phi for effective semantic segmentation of equirectangular images is 6pi/16 for standard CNNs.
- Score: 5.5856758231015915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 360{\deg} spherical images have advantages of wide view field, and are
typically projected on a planar plane for processing, which is known as
equirectangular image. The object shape in equirectangular images can be
distorted and lack translation invariance. In addition, there are few publicly
dataset of equirectangular images with labels, which presents a challenge for
standard CNNs models to process equirectangular images effectively. To tackle
this problem, we propose a methodology for converting a perspective image into
equirectangular image. The inverse transformation of the spherical center
projection and the equidistant cylindrical projection are employed. This
enables the standard CNNs to learn the distortion features at different
positions in the equirectangular image and thereby gain the ability to
semantically the equirectangular image. The parameter, {\phi}, which determines
the projection position of the perspective image, has been analyzed using
various datasets and models, such as UNet, UNet++, SegNet, PSPNet, and DeepLab
v3+. The experiments demonstrate that an optimal value of {\phi} for effective
semantic segmentation of equirectangular images is 6{\pi}/16 for standard CNNs.
Compared with the other three types of methods (supervised learning,
unsupervised learning and data augmentation), the method proposed in this paper
has the best average IoU value of 43.76%. This value is 23.85%, 10.7% and
17.23% higher than those of other three methods, respectively.
Related papers
- Combining Image- and Geometric-based Deep Learning for Shape Regression:
A Comparison to Pixel-level Methods for Segmentation in Chest X-Ray [0.07143413923310668]
We propose a novel hybrid method that combines a lightweight CNN backbone with a geometric neural network (Point Transformer) for shape regression.
We include the nnU-Net as an upper baseline, which has $3.7times$ more trainable parameters than our proposed method.
arXiv Detail & Related papers (2024-01-15T09:03:50Z) - Explicit Correspondence Matching for Generalizable Neural Radiance
Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.
The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.
Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z) - VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for
Analysis-by-Synthesis [62.47221232706105]
We propose VoGE, which utilizes the Gaussian reconstruction kernels as volumetric primitives.
To efficiently render via VoGE, we propose an approximate closeform solution for the volume density aggregation and a coarse-to-fine rendering strategy.
VoGE outperforms SoTA when applied to various vision tasks, e.g., object pose estimation, shape/texture fitting, and reasoning.
arXiv Detail & Related papers (2022-05-30T19:52:11Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - TransformNet: Self-supervised representation learning through predicting
geometric transformations [0.8098097078441623]
We describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data.
The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them.
arXiv Detail & Related papers (2022-02-08T22:41:01Z) - OSLO: On-the-Sphere Learning for Omnidirectional images and its
application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images.
Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z) - Probabilistic Vehicle Reconstruction Using a Multi-Task CNN [0.0]
We present a probabilistic approach for shape-aware 3D vehicle reconstruction from stereo images.
Specifically, we train a CNN that outputs probability distributions for the vehicle's orientation and for both, vehicle keypoints and wireframe edges.
We show that our method achieves state-of-the-art results, evaluating our method on the challenging KITTI benchmark.
arXiv Detail & Related papers (2021-02-21T20:45:44Z) - Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z) - Learning Equivariant Representations [10.745691354609738]
Convolutional neural networks (CNNs) are successful examples of this principle.
We propose equivariant models for different transformations defined by groups of symmetries.
These models leverage symmetries in the data to reduce sample and model complexity and improve generalization performance.
arXiv Detail & Related papers (2020-12-04T18:46:17Z) - What Does CNN Shift Invariance Look Like? A Visualization Study [87.79405274610681]
Feature extraction with convolutional neural networks (CNNs) is a popular method to represent images for machine learning tasks.
We focus on measuring and visualizing the shift invariance of extracted features from popular off-the-shelf CNN models.
We conclude that features extracted from popular networks are not globally invariant, and that biases and artifacts exist within this variance.
arXiv Detail & Related papers (2020-11-09T01:16:30Z) - How semantic and geometric information mutually reinforce each other in
ToF object localization [19.47618043504105]
We propose a novel approach to localize a 3D object from the intensity and depth information images provided by a Time-of-Flight (ToF) sensor.
Our proposed two-step approach improves segmentation and localization accuracy by a significant margin compared to a conventional CNN architecture.
arXiv Detail & Related papers (2020-08-27T09:13:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.