RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes
- URL: http://arxiv.org/abs/2509.15123v2
- Date: Fri, 19 Sep 2025 05:25:06 GMT
- Title: RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes
- Authors: Fang Li, Hao Zhang, Narendra Ahuja,
- Abstract summary: We propose a novel method for camera parameter optimization in dynamic scenes solely supervised by a single RGB video, dubbed ROS-Cam.<n>Our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.
- Score: 15.207366531969898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although COLMAP has long remained the predominant method for camera parameter optimization in static scenes, it is constrained by its lengthy runtime and reliance on ground truth (GT) motion masks for application to dynamic scenes. Many efforts attempted to improve it by incorporating more priors as supervision such as GT focal length, motion masks, 3D point clouds, camera poses, and metric depth, which, however, are typically unavailable in casually captured RGB videos. In this paper, we propose a novel method for more accurate and efficient camera parameter optimization in dynamic scenes solely supervised by a single RGB video, dubbed ROS-Cam. Our method consists of three key components: (1) Patch-wise Tracking Filters, to establish robust and maximally sparse hinge-like relations across the RGB video. (2) Outlier-aware Joint Optimization, for efficient camera parameter optimization by adaptive down-weighting of moving outliers, without reliance on motion priors. (3) A Two-stage Optimization Strategy, to enhance stability and optimization speed by a trade-off between the Softplus limits and convex minima in losses. We visually and numerically evaluate our camera estimates. To further validate accuracy, we feed the camera estimates into a 4D reconstruction method and assess the resulting 3D scenes, and rendered 2D RGB and depth maps. We perform experiments on 4 real-world datasets (NeRF-DS, DAVIS, iPhone, and TUM-dynamics) and 1 synthetic dataset (MPI-Sintel), demonstrating that our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.
Related papers
- JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction [18.636227266388218]
We present JOintGS, a unified framework that jointly optimize camera extrinsics, human poses, and 3D Gaussian representations.<n>Experiments on NeuMan and EMDB datasets demonstrate that JOintGS achieves superior reconstruction quality.
arXiv Detail & Related papers (2026-02-04T08:33:51Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring [31.35713139629235]
Reconstructing 3D scenes from monocular video often fails due to severe motion blur caused by camera and object motion.<n>We introduce a unified optimization framework by incorporating camera poses as learnable parameters.<n>Our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.
arXiv Detail & Related papers (2025-08-31T13:01:03Z) - SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images [125.66499135980344]
We propose SparseGrasp, a novel open-vocabulary robotic grasping system.<n>SparseGrasp operates efficiently with sparse-view RGB images and handles scene updates fastly.<n>We show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability.
arXiv Detail & Related papers (2024-12-03T03:56:01Z) - Diversity-Driven View Subset Selection for Indoor Novel View Synthesis [54.468355408388675]
We propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions.<n>We show that our framework consistently outperforms baseline strategies while using only 5-20% of the data.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting [14.759265492381509]
We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters.<n>It includes the extraction of 2D point features that robustly represent 3D structure.<n>Results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
arXiv Detail & Related papers (2024-06-03T06:52:35Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes [8.061773364318313]
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video.
We provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences.
This represents a strong new performance point for crowded scenes, an important setting for computer vision.
arXiv Detail & Related papers (2023-09-15T17:44:07Z) - CamP: Camera Preconditioning for Neural Radiance Fields [56.46526219931002]
NeRFs can be optimized to obtain high-fidelity 3D scene reconstructions of objects and large-scale scenes.
Extrinsic and intrinsic camera parameters are usually estimated using Structure-from-Motion (SfM) methods as a pre-processing step to NeRF.
We propose using a proxy problem to compute a whitening transform that eliminates the correlation between camera parameters and normalizes their effects.
arXiv Detail & Related papers (2023-08-21T17:59:54Z) - Event Camera-based Visual Odometry for Dynamic Motion Tracking of a
Legged Robot Using Adaptive Time Surface [5.341864681049579]
Event cameras offer high temporal resolution and dynamic range, which can eliminate the issue of blurred RGB images during fast movements.
We introduce an adaptive time surface (ATS) method that addresses the whiteout and blackout issue in conventional time surfaces.
Lastly, we propose a nonlinear pose optimization formula that simultaneously performs 3D-2D alignment on both RGB-based and event-based maps and images.
arXiv Detail & Related papers (2023-05-15T19:03:45Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in
the Wild [49.672487902268706]
We present a framework that jointly estimates camera temporal alignment and 3D point triangulation.
We reconstruct 3D motion trajectories of human bodies in events captured by multiple unsynchronized and unsynchronized video cameras.
arXiv Detail & Related papers (2020-07-24T23:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.