Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures
- URL: http://arxiv.org/abs/2510.14179v1
- Date: Thu, 16 Oct 2025 00:20:57 GMT
- Title: Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures
- Authors: Yuancheng Xu, Wenqi Xian, Li Ma, Julien Philip, Ahmet Levent Taşel, Yiwei Zhao, Ryan Burgert, Mingming He, Oliver Hermann, Oliver Pilarski, Rahul Garg, Paul Debevec, Ning Yu,
- Abstract summary: We introduce a framework that enables both multi-view character consistency and 3D camera control in video diffusion models.<n>We fine-tune state-of-the-art open-source video diffusion models on this data to provide strong multi-view identity preservation.<n>Our framework also supports core capabilities for virtual production, including multi-subject generation.
- Score: 18.241178853941623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a framework that enables both multi-view character consistency and 3D camera control in video diffusion models through a novel customization data pipeline. We train the character consistency component with recorded volumetric capture performances re-rendered with diverse camera trajectories via 4D Gaussian Splatting (4DGS), lighting variability obtained with a video relighting model. We fine-tune state-of-the-art open-source video diffusion models on this data to provide strong multi-view identity preservation, precise camera control, and lighting adaptability. Our framework also supports core capabilities for virtual production, including multi-subject generation using two approaches: joint training and noise blending, the latter enabling efficient composition of independently customized models at inference time; it also achieves scene and real-life video customization as well as control over motion and spatial layout during customization. Extensive experiments show improved video quality, higher personalization accuracy, and enhanced camera control and lighting adaptability, advancing the integration of video generation into virtual production. Our project page is available at: https://eyeline-labs.github.io/Virtually-Being.
Related papers
- Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z) - VDOT: Efficient Unified Video Creation via Optimal Transport Distillation [70.02065520468726]
We propose an efficient unified video creation model, named VDOT.<n>We employ a novel computational optimal transport (OT) technique to optimize the discrepancy between the real and fake score distributions.<n>To support training unified video creation models, we propose a fully automated pipeline for video data annotation and filtering.
arXiv Detail & Related papers (2025-12-07T11:31:00Z) - BulletTime: Decoupled Control of Time and Camera Pose for Video Generation [48.835425748367875]
We introduce a 4D-controllable video diffusion framework that explicitly decouples scene dynamics from camera pose.<n>We show that our model achieves robust real-world 4D control across diverse timing patterns and camera trajectories.
arXiv Detail & Related papers (2025-12-04T18:40:52Z) - MultiCOIN: Multi-Modal COntrollable Video INbetweening [46.37499813275259]
We introduce MultiCOIN, a video inbetweening framework that allows multi-modal controls.<n>To ensure compatibility between DiT and our multi-modal controls, we map all motion controls into a common sparse representation.<n>We also propose a stage-wise training strategy to ensure that our model learns the multi-modal controls smoothly.
arXiv Detail & Related papers (2025-10-09T17:59:27Z) - EchoShot: Multi-Shot Portrait Video Generation [37.77879735014084]
EchoShot is a native multi-shot framework for portrait customization built upon a foundation video diffusion model.<n>To facilitate model training within multi-shot scenario, we construct PortraitGala, a large-scale and high-fidelity human-centric video dataset.<n>To further enhance applicability, we extend EchoShot to perform reference image-based personalized multi-shot generation and long video synthesis with infinite shot counts.
arXiv Detail & Related papers (2025-06-16T11:00:16Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models [33.219657261649324]
TrajectoryCrafter is a novel approach to redirect camera trajectories for monocular videos.<n>By disentangling deterministic view transformations from content generation, our method achieves precise control over user-specified camera trajectories.
arXiv Detail & Related papers (2025-03-07T17:57:53Z) - SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation.<n>We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints.<n>Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z) - Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training [51.851390459940646]
We introduce Latent-Reframe, which enables camera control in a pre-trained video diffusion model without fine-tuning.<n>Latent-Reframe operates during the sampling stage, maintaining efficiency while preserving the original model distribution.<n>Our approach reframes the latent code of video frames to align with the input camera trajectory through time-aware point clouds.
arXiv Detail & Related papers (2024-12-08T18:59:54Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - Diversity-Driven View Subset Selection for Indoor Novel View Synthesis [54.468355408388675]
We propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions.<n>We show that our framework consistently outperforms baseline strategies while using only 5-20% of the data.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z) - CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely.<n>Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances.<n>We introduce CameraCtrl, enabling accurate camera pose control for video diffusion models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.