Related papers: OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision

OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision

URL: http://arxiv.org/abs/2401.17061v1
Date: Tue, 30 Jan 2024 14:40:19 GMT
Title: OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision
Authors: Bruno Berenguel-Baeta and Jesus Bermudez-Cameo and Jose J. Guerrero
Abstract summary: We present a tool for generating datasets of omnidirectional images with semantic and depth information. These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4. We include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems.
Score: 5.2178708158547025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Omnidirectional and 360{\deg} images are becoming widespread in industry and in consumer society, causing omnidirectional computer vision to gain attention. Their wide field of view allows the gathering of a great amount of information about the environment from only an image. However, the distortion of these images requires the development of specific algorithms for their treatment and interpretation. Moreover, a high number of images is essential for the correct training of computer vision algorithms based on learning. In this paper, we present a tool for generating datasets of omnidirectional images with semantic and depth information. These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4 through an interface plugin. We gather a variety of well-known projection models such as equirectangular and cylindrical panoramas, different fish-eye lenses, catadioptric systems, and empiric models. Furthermore, we include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems. As far as we know, this is the first reported tool for generating photorealistic non-central images in the literature. Moreover, since the omnidirectional images are made virtually, we provide pixel-wise information about semantics and depth as well as perfect knowledge of the calibration parameters of the cameras. This allows the creation of ground-truth information with pixel precision for training learning algorithms and testing 3D vision approaches. To validate the proposed tool, different computer vision algorithms are tested as line extractions from dioptric and catadioptric central images, 3D Layout recovery and SLAM using equirectangular panoramas, and 3D reconstruction from non-central panoramas.

Related papers

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting [64.64738535860351]
We present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations.<n>Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding.<n>By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence.
arXiv Detail & Related papers (2025-07-24T14:53:26Z)
GenSpace: Benchmarking Spatially-Aware Image Generation [76.98817635685278]
Humans intuitively compose and arrange scenes in the 3D space for photography.<n>Can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts?<n>We present GenSpace, a novel benchmark and evaluation pipeline to assess the spatial awareness of current image generation models.
arXiv Detail & Related papers (2025-05-30T17:59:26Z)
Incorporating dense metric depth into neural 3D representations for view synthesis and relighting [25.028859317188395]
In robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled. In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations. We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis.
arXiv Detail & Related papers (2024-09-04T20:21:13Z)
MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field [1.3162012586770577]
We introduce MSI-NeRF, which combines deep learning omnidirectional depth estimation and novel view synthesis. We construct a multi-sphere image as a cost volume through feature extraction and warping of the input images. Our network has the generalization ability to reconstruct unknown scenes efficiently using only four images.
arXiv Detail & Related papers (2024-03-16T07:26:50Z)
Calibrating Panoramic Depth Estimation for Practical Localization and Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation. We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z)
altiro3D: Scene representation from single image and novel view synthesis [0.0]
altiro3D is library developed to represent reality starting from a given original RGB image or flat video. It allows to generate a light-field (or Native) image or video and get a realistic 3D experience.
arXiv Detail & Related papers (2023-04-02T16:03:44Z)
Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis. OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods. It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z)
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z)
Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images. A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image. We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
Neural Reflectance Fields for Appearance Acquisition [61.542001266380375]
We present Neural Reflectance Fields, a novel deep scene representation that encodes volume density, normal and reflectance properties at any 3D point in a scene. We combine this representation with a physically-based differentiable ray marching framework that can render images from a neural reflectance field under any viewpoint and light.
arXiv Detail & Related papers (2020-08-09T22:04:36Z)
SAILenv: Learning in Virtual Visual Environments Made Simple [16.979621213790015]
We present a novel platform that allows researchers to experiment visual recognition in virtual 3D scenes. A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself. Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine.
arXiv Detail & Related papers (2020-07-16T09:50:23Z)
Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. We train a specialized global-local network architecture with what would be available to a robot interacting with the environment. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.