3D Scene Compression through Entropy Penalized Neural Representation
Functions
- URL: http://arxiv.org/abs/2104.12456v1
- Date: Mon, 26 Apr 2021 10:36:47 GMT
- Title: 3D Scene Compression through Entropy Penalized Neural Representation
Functions
- Authors: Thomas Bird, Johannes Ball\'e, Saurabh Singh, Philip A. Chou
- Abstract summary: novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views.
These types of applications require much larger amounts of storage space, which we seek to reduce.
Existing approaches for compressing 3D scenes are based on a separation of compression and rendering.
We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints.
Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstruction
- Score: 19.277502420759653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Some forms of novel visual media enable the viewer to explore a 3D scene from
arbitrary viewpoints, by interpolating between a discrete set of original
views. Compared to 2D imagery, these types of applications require much larger
amounts of storage space, which we seek to reduce. Existing approaches for
compressing 3D scenes are based on a separation of compression and rendering:
each of the original views is compressed using traditional 2D image formats;
the receiver decompresses the views and then performs the rendering. We unify
these steps by directly compressing an implicit representation of the scene, a
function that maps spatial coordinates to a radiance vector field, which can
then be queried to render arbitrary viewpoints. The function is implemented as
a neural network and jointly trained for reconstruction as well as
compressibility, in an end-to-end manner, with the use of an entropy penalty on
the parameters. Our method significantly outperforms a state-of-the-art
conventional approach for scene compression, achieving simultaneously higher
quality reconstructions and lower bitrates. Furthermore, we show that the
performance at lower bitrates can be improved by jointly representing multiple
scenes using a soft form of parameter sharing.
Related papers
- Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Scene Representation Transformer: Geometry-Free Novel View Synthesis
Through Set-Latent Scene Representations [48.05445941939446]
A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates.
We propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area.
We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets.
arXiv Detail & Related papers (2021-11-25T16:18:56Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z) - Efficient Scene Compression for Visual-based Localization [5.575448433529451]
Estimating the pose of a camera with respect to a 3D reconstruction or scene representation is a crucial step for many mixed reality and robotics applications.
This work introduces a novel approach that compresses a scene representation by means of a constrained quadratic program (QP)
Our experiments on publicly available datasets show that our approach compresses a scene representation quickly while delivering accurate pose estimates.
arXiv Detail & Related papers (2020-11-27T18:36:06Z) - Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation [33.71628590745982]
We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images.
We propose a simple and effective compression method to drastically reduce the size of this representation.
Our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.
arXiv Detail & Related papers (2020-04-01T10:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.