Related papers: SMORE: Simulataneous Map and Object REconstruction

SMORE: Simulataneous Map and Object REconstruction

URL: http://arxiv.org/abs/2406.13896v2
Date: Mon, 06 Jan 2025 21:25:07 GMT
Title: SMORE: Simulataneous Map and Object REconstruction
Authors: Nathaniel Chodosh, Anish Madan, Simon Lucey, Deva Ramanan,
Abstract summary: We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR.<n>We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background.
Score: 66.66729715211642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR. Depth-based reconstructions tend to focus on small-scale objects or large-scale SLAM reconstructions that treat moving objects as outliers. We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background. To achieve this, we take inspiration from recent novel view synthesis methods and frame the reconstruction problem as a global optimization over neural surfaces, ego poses, and object poses, which minimizes the error between composed spacetime surfaces and input LiDAR scans. In contrast to view synthesis methods, which typically minimize 2D errors with gradient descent, we minimize a 3D point-to-surface error by coordinate descent, which we decompose into registration and surface reconstruction steps. Each step can be handled well by off-the-shelf methods without any re-training. We analyze the surface reconstruction step for rolling-shutter LiDARs, and show that deskewing operations common in continuous time SLAM can be applied to dynamic objects as well, improving results over prior art by an order of magnitude. Beyond pursuing dynamic reconstruction as a goal in and of itself, we propose that such a system can be used to auto-label partially annotated sequences and produce ground truth annotation for hard-to-label problems such as depth completion and scene flow.

Related papers

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications. Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas. We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z)
Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion [16.379647695019308]
3D scene reconstruction is a foundational problem in computer vision. We introduce the Gaussian Object Carver (GOC), a novel, efficient, and scalable framework for object-compositional 3D scene reconstruction. GOC leverage 3D Gaussian Splatting (GS), enriched with monocular geometry priors and multi-view geometry regularization, to achieve high-quality and flexible reconstruction.
arXiv Detail & Related papers (2024-12-03T01:34:39Z)
Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction [3.9363268745580426]
AT-GS is a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization. We reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames. Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis.
arXiv Detail & Related papers (2024-11-10T21:30:16Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes [30.32214593068206]
We present a space-time 2D Gaussian Splatting approach to tackle the dynamic contents and the occlusions in complex scenes. Specifically, to improve geometric quality in dynamic scenes, we learn canonical 2D Gaussian splats and deform these 2D Gaussian splats. We also introduce a compositional opacity strategy, which further reduces the surface recovery of those occluded areas. Experiments on real-world sparse-view video datasets and monocular dynamic datasets demonstrate that our reconstructions outperform state-of-the-art methods.
arXiv Detail & Related papers (2024-09-27T15:50:36Z)
OmniRe: Omni Urban Scene Reconstruction [78.99262488964423]
We introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs. We propose a comprehensive 3DGS framework for driving scenes, named OmniRe, that allows for accurate, full-length reconstruction of diverse dynamic objects in a driving log.
arXiv Detail & Related papers (2024-08-29T17:56:33Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z)
SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes [75.9110646062442]
We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
arXiv Detail & Related papers (2023-08-16T09:50:35Z)
Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes. The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z)
Visual-Inertial Multi-Instance Dynamic SLAM with Object-level Relocalisation [14.302118093865849]
We present a tightly-coupled visual-inertial object-level multi-instance dynamic SLAM system. It can robustly optimise for the camera pose, velocity, IMU biases and build a dense 3D reconstruction object-level map of the environment.
arXiv Detail & Related papers (2022-08-08T17:13:24Z)
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction [88.02850205432763]
We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision. We observe that the conventional volume rendering method causes inherent geometric errors for surface reconstruction. We propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision.
arXiv Detail & Related papers (2021-06-20T12:59:42Z)
Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances. We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z)
Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective. We reconstruct an interactive scene using RGB-D data stream. This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z)
Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications [78.63280020581662]
A novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-24T08:16:32Z)
Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu) We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)
Reconstruct, Rasterize and Backprop: Dense shape and pose estimation from a single image [14.9851111159799]
This paper presents a new system to obtain dense object reconstructions along with 6-DoF poses from a single image. We leverage recent advances in differentiable rendering (in particular, robotics) to close the loop with 3D reconstruction in camera frame.
arXiv Detail & Related papers (2020-04-25T20:53:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.