360 in the Wild: Dataset for Depth Prediction and View Synthesis
- URL: http://arxiv.org/abs/2406.18898v2
- Date: Fri, 5 Jul 2024 02:56:10 GMT
- Title: 360 in the Wild: Dataset for Depth Prediction and View Synthesis
- Authors: Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon,
- Abstract summary: We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
- Score: 66.58513725342125
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{\circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis.
Related papers
- PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis [120.4361056355332]
This thesis introduces Paired Image and Video data from three CAMeraS, namely PIV3CAMS.
The PIV3CAMS dataset consists of 8385 pairs of images and 82 pairs of videos taken from three different cameras.
In addition to the regeneration of a current state-of-the-art algorithm, we investigate several proposed alternative models that integrate depth information geometrically.
arXiv Detail & Related papers (2024-07-26T12:18:29Z) - MegaScenes: Scene-Level View Synthesis at Scale [69.21293001231993]
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications.
We create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world.
We analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency.
arXiv Detail & Related papers (2024-06-17T17:55:55Z) - 360+x: A Panoptic Multi-modal Scene Understanding Dataset [13.823967656097146]
360+x is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world.
To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world.
arXiv Detail & Related papers (2024-04-01T08:34:42Z) - MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering [91.76893697171117]
We propose a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human.
Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos.
We collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs.
arXiv Detail & Related papers (2024-03-27T17:59:54Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - ImageSubject: A Large-scale Dataset for Subject Detection [9.430492045581534]
Main subjects usually exist in the images or videos, as they are the objects that the photographer wants to highlight.
Detecting the main subjects is an important technique to help machines understand the content of images and videos.
We present a new dataset with the goal of training models to understand the layout of the objects then to find the main subjects among them.
arXiv Detail & Related papers (2022-01-09T22:49:59Z) - Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes [85.56602190773684]
We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view.
By minimizing the error between the synthetic image and the corresponding real image in a video, the deep network that predicts pose and depth can be trained completely unsupervised.
arXiv Detail & Related papers (2021-05-05T17:08:10Z) - EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes [21.695100437184507]
This dataset features more than 300K images captured from more than 100 garden models.
Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow.
Experimental results on the state-of-the-art methods for semantic segmentation and monocular depth prediction, two important tasks in computer vision, show positive impact of pre-training deep networks on our dataset for unstructured natural scenes.
arXiv Detail & Related papers (2020-11-09T12:44:29Z) - Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene
Understanding [8.720130442653575]
Hypersim is a synthetic dataset for holistic indoor scene understanding.
We generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
arXiv Detail & Related papers (2020-11-04T20:12:07Z) - SIDOD: A Synthetic Image Dataset for 3D Object Pose Recognition with
Distractors [10.546457120988494]
This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects.
We describe our approach for domain randomization and provide insight into the decisions that produced the dataset.
arXiv Detail & Related papers (2020-08-12T00:14:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.