Efficient Scene Compression for Visual-based Localization
- URL: http://arxiv.org/abs/2011.13894v1
- Date: Fri, 27 Nov 2020 18:36:06 GMT
- Title: Efficient Scene Compression for Visual-based Localization
- Authors: Marcela Mera-Trujillo, Benjamin Smith, Victor Fragoso
- Abstract summary: Estimating the pose of a camera with respect to a 3D reconstruction or scene representation is a crucial step for many mixed reality and robotics applications.
This work introduces a novel approach that compresses a scene representation by means of a constrained quadratic program (QP)
Our experiments on publicly available datasets show that our approach compresses a scene representation quickly while delivering accurate pose estimates.
- Score: 5.575448433529451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the pose of a camera with respect to a 3D reconstruction or scene
representation is a crucial step for many mixed reality and robotics
applications. Given the vast amount of available data nowadays, many
applications constrain storage and/or bandwidth to work efficiently. To satisfy
these constraints, many applications compress a scene representation by
reducing its number of 3D points. While state-of-the-art methods use
$K$-cover-based algorithms to compress a scene, they are slow and hard to tune.
To enhance speed and facilitate parameter tuning, this work introduces a novel
approach that compresses a scene representation by means of a constrained
quadratic program (QP). Because this QP resembles a one-class support vector
machine, we derive a variant of the sequential minimal optimization to solve
it. Our approach uses the points corresponding to the support vectors as the
subset of points to represent a scene. We also present an efficient
initialization method that allows our method to converge quickly. Our
experiments on publicly available datasets show that our approach compresses a
scene representation quickly while delivering accurate pose estimates.
Related papers
- FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields [13.729716867839509]
We propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance.
In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field.
Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.
arXiv Detail & Related papers (2024-08-07T14:56:34Z) - SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene.
SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z) - Quadric Representations for LiDAR Odometry, Mapping and Localization [93.24140840537912]
Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes.
We propose a novel method of describing scenes using quadric surfaces, which are far more compact representations of 3D objects.
Our method maintains low latency and memory utility while achieving competitive, and even superior, accuracy.
arXiv Detail & Related papers (2023-04-27T13:52:01Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera
Localization [60.73541222862195]
NeuMap is an end-to-end neural mapping method for camera localization.
It encodes a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels.
arXiv Detail & Related papers (2022-11-21T04:46:22Z) - Revisiting Point Cloud Simplification: A Learnable Feature Preserving
Approach [57.67932970472768]
Mesh and Point Cloud simplification methods aim to reduce the complexity of 3D models while retaining visual quality and relevant salient features.
We propose a fast point cloud simplification method by learning to sample salient points.
The proposed method relies on a graph neural network architecture trained to select an arbitrary, user-defined, number of points from the input space and to re-arrange their positions so as to minimize the visual perception error.
arXiv Detail & Related papers (2021-09-30T10:23:55Z) - 3D Scene Compression through Entropy Penalized Neural Representation
Functions [19.277502420759653]
novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views.
These types of applications require much larger amounts of storage space, which we seek to reduce.
Existing approaches for compressing 3D scenes are based on a separation of compression and rendering.
We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints.
Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstruction
arXiv Detail & Related papers (2021-04-26T10:36:47Z) - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D
Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames.
We aggregate PointNet features from primitive scene components by means of a graph neural network.
Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z) - Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation [33.71628590745982]
We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images.
We propose a simple and effective compression method to drastically reduce the size of this representation.
Our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.
arXiv Detail & Related papers (2020-04-01T10:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.