ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
- URL: http://arxiv.org/abs/2512.03621v1
- Date: Wed, 03 Dec 2025 09:55:25 GMT
- Title: ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
- Authors: Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaojie Shen, Guang Tan,
- Abstract summary: ReCamDriving is a vision-based, camera-controlled novel-trajectory video generation framework.<n>We present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns.
- Score: 38.23100905961028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based approaches rely on sparse and incomplete cues, ReCamDriving leverages dense and scene-complete 3DGS renderings for explicit geometric guidance, achieving precise camera-controllable generation. To mitigate overfitting to restoration behaviors when conditioned on 3DGS renderings, ReCamDriving adopts a two-stage training paradigm: the first stage uses camera poses for coarse control, while the second stage incorporates 3DGS renderings for fine-grained viewpoint and geometric guidance. Furthermore, we present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns, enabling scalable multi-trajectory supervision from monocular videos. Based on this strategy, we construct the ParaDrive dataset, containing over 110K parallel-trajectory video pairs. Extensive experiments demonstrate that ReCamDriving achieves state-of-the-art camera controllability and structural consistency.
Related papers
- FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning [45.013802909442184]
We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input.<n> Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.
arXiv Detail & Related papers (2026-03-05T18:59:58Z) - WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories [36.79437857022868]
WorldStereo is a novel framework that bridges camera-guided video generation and 3D reconstruction.<n>We show that WorldStereo acts as a powerful world model, tackling diverse scene generation tasks with high-fidelity 3D results.
arXiv Detail & Related papers (2026-03-02T16:36:56Z) - Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation [21.084121261693365]
We propose DepthDirector, a video re-rendering framework with precise camera controllability.<n>By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories.
arXiv Detail & Related papers (2026-01-15T09:26:45Z) - Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z) - PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations [102.0476991174456]
COLMAP-free 3DGS has attracted increasing attention due to its remarkable performance in reconstructing high-quality 3D scenes from unposed images or videos.<n>We propose PCR-GS, an innovative COLMAP-free 3DGS technique that achieves superior 3D scene modeling and camera pose estimation via camera pose co-regularization.
arXiv Detail & Related papers (2025-07-18T13:09:33Z) - SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z) - Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [73.73984727616198]
We present Uni3C, a unified framework for precise control of both camera and human motion in video generation.<n>First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController.<n>Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters.
arXiv Detail & Related papers (2025-04-21T07:10:41Z) - CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving [25.156989992025625]
We introduce a novel spatial adaptive generation framework, CoGen, to achieve controllable multi-view videos with high 3D consistency.<n>By replacing coarse 2D conditions with fine-grained 3D representations, our approach significantly enhances the spatial consistency of the generated videos.<n>Results demonstrate that this method excels in preserving geometric fidelity and visual realism, offering a reliable video generation solution for autonomous driving.
arXiv Detail & Related papers (2025-03-28T08:27:05Z) - LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors [107.83398512719981]
Single-image 3D reconstruction remains a fundamental challenge in computer vision.<n>Recent advances in Latent Video Diffusion Models offer promising 3D priors learned from large-scale video data.<n>We propose LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency.
arXiv Detail & Related papers (2024-12-12T18:58:42Z) - T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.<n>We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.