Related papers: 3D Scene Compression through Entropy Penalized Neural Representation Functions

3D Scene Compression through Entropy Penalized Neural Representation Functions

URL: http://arxiv.org/abs/2104.12456v1
Date: Mon, 26 Apr 2021 10:36:47 GMT
Title: 3D Scene Compression through Entropy Penalized Neural Representation Functions
Authors: Thomas Bird, Johannes Ball\'e, Saurabh Singh, Philip A. Chou
Abstract summary: novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. These types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering. We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints. Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstruction
Score: 19.277502420759653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the original views is compressed using traditional 2D image formats; the receiver decompresses the views and then performs the rendering. We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints. The function is implemented as a neural network and jointly trained for reconstruction as well as compressibility, in an end-to-end manner, with the use of an entropy penalty on the parameters. Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstructions and lower bitrates. Furthermore, we show that the performance at lower bitrates can be improved by jointly representing multiple scenes using a soft form of parameter sharing.

Related papers

TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation [38.50388562890992]
TinySplat is a complete feedforward approach for generating compact 3D scene representations.<n>Built upon standard feedforward 3DGS methods, TinySplat integrates a training-free compression framework.<n>Our framework requires only 25% of the encoding time and 1% of the decoding time.
arXiv Detail & Related papers (2025-06-11T07:47:19Z)
Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z)
You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query. Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames. We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z)
CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts. We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area. Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z)
MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation. Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage. Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations [48.05445941939446]
A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. We propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area. We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets.
arXiv Detail & Related papers (2021-11-25T16:18:56Z)
IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z)
Efficient Scene Compression for Visual-based Localization [5.575448433529451]
Estimating the pose of a camera with respect to a 3D reconstruction or scene representation is a crucial step for many mixed reality and robotics applications. This work introduces a novel approach that compresses a scene representation by means of a constrained quadratic program (QP) Our experiments on publicly available datasets show that our approach compresses a scene representation quickly while delivering accurate pose estimates.
arXiv Detail & Related papers (2020-11-27T18:36:06Z)
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation [33.71628590745982]
We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose a simple and effective compression method to drastically reduce the size of this representation. Our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.
arXiv Detail & Related papers (2020-04-01T10:37:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.