RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
- URL: http://arxiv.org/abs/2504.08212v1
- Date: Fri, 11 Apr 2025 02:35:19 GMT
- Title: RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
- Authors: Guangcong Zheng, Teng Li, Xianpan Zhou, Xi Li,
- Abstract summary: RealCam-Vid is an open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations.<n>This paper introduces the first fully open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations.
- Score: 9.714839452308581
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in camera-controllable video generation have been constrained by the reliance on static-scene datasets with relative-scale camera annotations, such as RealEstate10K. While these datasets enable basic viewpoint control, they fail to capture dynamic scene interactions and lack metric-scale geometric consistency-critical for synthesizing realistic object motions and precise camera trajectories in complex environments. To bridge this gap, we introduce the first fully open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations in https://github.com/ZGCTroy/RealCam-Vid.
Related papers
- Dynamic Camera Poses and Where to Find Them [36.249380390918816]
We introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses.
For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion.
Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes.
arXiv Detail & Related papers (2025-04-24T17:59:56Z) - Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction [78.27956235915622]
Traditional SLAM systems struggle with highly dynamic scenes commonly found in casual videos.
This work leverages a 3D point tracker to separate the camera-induced motion from the observed motion of dynamic objects.
Our framework combines the core of traditional SLAM -- bundle adjustment -- with a robust learning-based 3D tracker front-end.
arXiv Detail & Related papers (2025-04-20T07:29:42Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control [10.939379611590333]
RealCam-I2V is a novel diffusion-based video generation framework.<n>It integrates monocular metric depth estimation to establish 3D scene reconstruction in a preprocessing step.<n>During training, the reconstructed 3D scene enables scaling camera parameters from relative to absolute values.<n>RealCam-I2V achieves significant improvements in controllability and video quality on the RealEstate10K and out-of-domain images.
arXiv Detail & Related papers (2025-02-14T10:21:49Z) - RoMo: Robust Motion Segmentation Improves Structure from Motion [46.77236343300953]
We propose a novel approach to video-based motion segmentation to identify the components of a scene that are moving w.r.t. a fixed world frame.<n>Our simple but effective iterative method, RoMo, combines optical flow and epipolar cues with a pre-trained video segmentation model.<n>More importantly, the combination of an off-the-shelf SfM pipeline with our segmentation masks establishes a new state-of-the-art on camera calibration for scenes with dynamic content, outperforming existing methods by a substantial margin.
arXiv Detail & Related papers (2024-11-27T01:09:56Z) - DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild [85.03973683867797]
This paper proposes a concise, elegant, and robust pipeline to estimate smooth camera trajectories and obtain dense point clouds for casual videos in the wild.
We show that the proposed method achieves state-of-the-art performance in terms of camera pose estimation even in complex dynamic challenge scenes.
arXiv Detail & Related papers (2024-11-20T13:01:16Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.