DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction
- URL: http://arxiv.org/abs/2409.02104v1
- Date: Tue, 3 Sep 2024 17:58:03 GMT
- Title: DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction
- Authors: Jenny Seidenschwarz, Qunjie Zhou, Bardienus Duisterhof, Deva Ramanan, Laura Leal-Taixé,
- Abstract summary: We target the challenge of online 2D and 3D point tracking from unposed monocular camera input.
We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion.
We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
- Score: 65.46359561104867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
Related papers
- Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Visual Geometry Grounded Deep Structure From Motion [20.203320509695306]
We propose a new deep pipeline VGGSfM, where each component is fully differentiable and can be trained in an end-to-end manner.
First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches.
We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.
arXiv Detail & Related papers (2023-12-07T18:59:52Z) - R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation.
Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement.
We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - Online Adaptation for Implicit Object Tracking and Shape Reconstruction
in the Wild [22.19769576901151]
We introduce a novel and unified framework which utilizes a DeepSDF model to simultaneously track and reconstruct 3D objects in the wild.
We show significant improvements over state-of-the-art methods for both tracking and shape reconstruction.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Reconstruction of 3D flight trajectories from ad-hoc camera networks [19.96488566402593]
We present a method to reconstruct the 3D trajectory of an airborne robotic system only from videos recorded with cameras that are unsynchronized.
Our approach enables robust and accurate outside-in tracking of dynamically flying targets, with cheap and easy-to-deploy equipment.
arXiv Detail & Related papers (2020-03-10T14:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.