Pose2RGBD. Generating Depth and RGB images from absolute positions
- URL: http://arxiv.org/abs/2007.07013v1
- Date: Tue, 14 Jul 2020 13:07:06 GMT
- Title: Pose2RGBD. Generating Depth and RGB images from absolute positions
- Authors: Mihai Cristian P\^irvu
- Abstract summary: We propose a method to automatically generate RGBD images based on previously seen and synchronized video, depth and pose signals.
The process can be thought of as neural rendering, where we obtain a function f : Pose -> RGBD, which we can use to navigate through the generated scene.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We propose a method at the intersection of Computer Vision and Computer
Graphics fields, which automatically generates RGBD images using neural
networks, based on previously seen and synchronized video, depth and pose
signals. Since the models must be able to reconstruct both texture (RGB) and
structure (Depth), it creates an implicit representation of the scene, as
opposed to explicit ones, such as meshes or point clouds. The process can be
thought of as neural rendering, where we obtain a function f : Pose -> RGBD,
which we can use to navigate through the generated scene, similarly to graphics
simulations. We introduce two new datasets, one based on synthetic data with
full ground truth information, while the other one being recorded from a drone
flight in an university campus, using only video and GPS signals. Finally, we
propose a fully unsupervised method of generating datasets from videos alone,
in order to train the Pose2RGBD networks. Code and datasets are available at::
https://gitlab.com/mihaicristianpirvu/pose2rgbd.
Related papers
- DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation [66.7989548848166]
Existing approaches encode depth maps along with RGB images and perform feature fusion between them to enable more robust predictions.
We present DFormerv2, a strong RGBD encoder that explicitly uses depth maps as geometry priors rather than encoding depth information with neural networks.
Our goal is to extract the geometry clues from the depth and spatial distances among all the image patch tokens, which will then be used as geometry priors to allocate attention weights in self-attention.
arXiv Detail & Related papers (2025-04-07T03:06:07Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - DFormer: Rethinking RGBD Representation Learning for Semantic
Segmentation [76.81628995237058]
DFormer is a novel framework to learn transferable representations for RGB-D segmentation tasks.
It pretrains the backbone using image-depth pairs from ImageNet-1K.
DFormer achieves new state-of-the-art performance on two popular RGB-D tasks.
arXiv Detail & Related papers (2023-09-18T11:09:11Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Robust Double-Encoder Network for RGB-D Panoptic Segmentation [31.807572107839576]
Panoptic segmentation provides an interpretation of the scene by computing a pixelwise semantic label together with instance IDs.
We propose a novel encoder-decoder neural network that processes RGB and depth separately through two encoders.
We show that our approach achieves superior results compared to other common approaches for panoptic segmentation.
arXiv Detail & Related papers (2022-10-06T11:46:37Z) - Talking Head from Speech Audio using a Pre-trained Image Generator [5.659018934205065]
We propose a novel method for generating high-resolution videos of talking-heads from speech audio and a single 'identity' image.
We model each frame as a point in the latent space of StyleGAN so that a video corresponds to a trajectory through the latent space.
We train a recurrent neural network to map from speech utterances to displacements in the latent space of the image generator.
arXiv Detail & Related papers (2022-09-09T11:20:37Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Scale Invariant Semantic Segmentation with RGB-D Fusion [12.650574326251023]
We propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images.
We incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene.
Our model is compact and can be easily applied to the other RGB model.
arXiv Detail & Related papers (2022-04-10T12:54:27Z) - RayTran: 3D pose estimation and shape reconstruction of multiple objects
from videos with ray-traced transformers [41.499325832227626]
We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos.
We exploit knowledge about the image formation process to significantly sparsify the attention weight matrix.
Compared to previous methods, our architecture is single stage, end-to-end trainable.
arXiv Detail & Related papers (2022-03-24T18:49:12Z) - FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation [54.666329929930455]
We present FFB6D, a Bidirectional fusion network designed for 6D pose estimation from a single RGBD image.
We learn to combine appearance and geometry information for representation learning as well as output representation selection.
Our method outperforms the state-of-the-art by large margins on several benchmarks.
arXiv Detail & Related papers (2021-03-03T08:07:29Z) - CycleISP: Real Image Restoration via Improved Data Synthesis [166.17296369600774]
We present a framework that models camera imaging pipeline in forward and reverse directions.
By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets.
arXiv Detail & Related papers (2020-03-17T15:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.