Towards Better Robustness: Pose-Free 3D Gaussian Splatting for Arbitrarily Long Videos
- URL: http://arxiv.org/abs/2501.15096v2
- Date: Sun, 25 May 2025 01:57:23 GMT
- Title: Towards Better Robustness: Pose-Free 3D Gaussian Splatting for Arbitrarily Long Videos
- Authors: Zhen-Hui Dong, Sheng Ye, Yu-Hui Wen, Nannan Li, Yong-Jin Liu,
- Abstract summary: 3D Gaussian Splatting (3DGS) has emerged as a powerful representation due to its efficiency and high-fidelity rendering.<n>We propose Rob-GS, a robust framework to progressively estimate camera poses and optimize 3DGS for arbitrarily long video inputs.
- Score: 24.959777640700178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful representation due to its efficiency and high-fidelity rendering. 3DGS training requires a known camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Pioneering works have attempted to relax this restriction but still face difficulties when handling long sequences with complex camera trajectories. In this paper, we propose Rob-GS, a robust framework to progressively estimate camera poses and optimize 3DGS for arbitrarily long video inputs. In particular, by leveraging the inherent continuity of videos, we design an adjacent pose tracking method to ensure stable pose estimation between consecutive frames. To handle arbitrarily long inputs, we propose a Gaussian visibility retention check strategy to adaptively split the video sequence into several segments and optimize them separately. Extensive experiments on Tanks and Temples, ScanNet, and a self-captured dataset show that Rob-GS outperforms the state-of-the-arts.
Related papers
- LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos [24.61106294159454]
LongSplat addresses challenges in novel view synthesis (NVS) from casually captured long videos characterized by irregular camera motion, unknown camera poses, and expansive scenes.<n>LongSplat is a robust unposed 3D Gaussian Splatting framework featuring: (1) Incremental Joint Optimization that concurrently optimize camera poses and 3D Gaussians to avoid local minima and ensure global consistency; (2) a robust Pose Estimation Module leveraging learned 3D priors; and (3) an efficient Octree Anchor Formation mechanism that converts dense point clouds into anchors based on spatial density.
arXiv Detail & Related papers (2025-08-19T17:59:56Z) - PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations [102.0476991174456]
COLMAP-free 3DGS has attracted increasing attention due to its remarkable performance in reconstructing high-quality 3D scenes from unposed images or videos.<n>We propose PCR-GS, an innovative COLMAP-free 3DGS technique that achieves superior 3D scene modeling and camera pose estimation via camera pose co-regularization.
arXiv Detail & Related papers (2025-07-18T13:09:33Z) - ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes [4.089605790376984]
We propose to incorporate Iterative Closest Point (ICP) with optimization-based refinement to achieve accurate camera pose estimation under large camera movements.<n>We also introduce a voxel-based scene densification approach to guide the reconstruction in large-scale scenes.<n> Experiments demonstrate that our approach ICP-3DGS outperforms existing methods in both camera pose estimation and novel view synthesis.
arXiv Detail & Related papers (2025-06-24T21:10:06Z) - On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images [48.8544345503807]
We present an on-the-fly method to produce camera poses and a trained 3DGS immediately after capture.<n>Our method can handle dense and wide-baseline captures of ordered photo sequences and large-scale scenes.
arXiv Detail & Related papers (2025-06-05T20:10:18Z) - FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting [57.97160965244424]
3D Gaussian splatting (3DGS) has enabled various applications in 3D scene representation and novel view synthesis.<n>Previous approaches have focused on pruning less important Gaussians, effectively compressing 3DGS.<n>We present an elastic inference method for 3DGS, achieving substantial rendering performance without additional fine-tuning.
arXiv Detail & Related papers (2025-06-04T17:17:57Z) - 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS [36.48425755917156]
3D Gaussian Splatting (3DGS) has revolutionized neural rendering with its efficiency and quality.
It heavily depends on accurate camera poses from Structure-from-Motion (SfM) systems.
We present 3R-GS, a 3D Gaussian Splatting framework that bridges this gap.
arXiv Detail & Related papers (2025-04-05T22:31:08Z) - TrackGS: Optimizing COLMAP-Free 3D Gaussian Splatting with Global Track Constraints [40.9371798496134]
We introduce TrackGS, which incorporates feature tracks to globally constrain multi-view geometry.
We also propose minimizing both reprojection and backprojection errors for better geometric consistency.
By deriving the gradient of intrinsics, we unify camera parameter estimation with 3DGS training into a joint optimization framework.
arXiv Detail & Related papers (2025-02-27T06:16:04Z) - KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences [14.792295042683254]
We present an efficient framework that operates without any depth or matching model.<n>We propose a coarse-to-fine frequency-aware densification to reconstruct different levels of details.
arXiv Detail & Related papers (2024-12-30T07:32:35Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video [64.38566659338751]
We propose the first 4D Gaussian Splatting framework to reconstruct a high-quality 4D model from blurry monocular video, named Deblur4DGS.
We introduce exposure regularization to avoid trivial solutions, as well as multi-frame and multi-resolution consistency ones to alleviate artifacts. Beyond novel-view, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame synthesis, and video stabilization.
arXiv Detail & Related papers (2024-12-09T12:02:11Z) - SfM-Free 3D Gaussian Splatting via Hierarchical Training [42.85362760049813]
We propose a novel SfM-Free 3DGS (SFGS) method for video input, eliminating the need for known camera poses and SfM preprocessing.<n>Our approach introduces a hierarchical training strategy that trains and merges multiple 3D Gaussian representations into a single, unified 3DGS model.<n> Experimental results reveal that our approach significantly surpasses state-of-the-art SfM-free novel view synthesis methods.
arXiv Detail & Related papers (2024-12-02T14:39:06Z) - EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting [76.02450110026747]
Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution.
We propose Event-Aided Free-Trajectory 3DGS, which seamlessly integrates the advantages of event cameras into 3DGS.
We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS.
arXiv Detail & Related papers (2024-10-20T13:44:24Z) - Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting [94.84688557937123]
Video-3DGS is a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors.
Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos.
It enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
arXiv Detail & Related papers (2024-06-04T17:57:37Z) - Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion [25.54868552979793]
We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data.
Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods.
arXiv Detail & Related papers (2024-03-20T06:19:41Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - COLMAP-Free 3D Gaussian Splatting [88.420322646756]
We propose a novel method to perform novel view synthesis without any SfM preprocessing.
We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time.
Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.
arXiv Detail & Related papers (2023-12-12T18:39:52Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.