FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning
- URL: http://arxiv.org/abs/2603.05506v1
- Date: Thu, 05 Mar 2026 18:59:58 GMT
- Title: FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning
- Authors: Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu,
- Abstract summary: We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input.<n> Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.
- Score: 45.013802909442184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.
Related papers
- Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation [21.084121261693365]
We propose DepthDirector, a video re-rendering framework with precise camera controllability.<n>By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories.
arXiv Detail & Related papers (2026-01-15T09:26:45Z) - Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z) - VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos [58.09854638265381]
VividCam is a training paradigm that enables diffusion models to learn complex camera motions from synthetic videos.<n>We demonstrate that our design synthesizes a wide range of precisely controlled and complex camera motions using surprisingly simple synthetic data.
arXiv Detail & Related papers (2025-10-28T19:12:22Z) - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.<n>It reproduces the dynamic scene of an input video at novel camera trajectories.<n>Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z) - AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.<n>We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z) - VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We show how to tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism.<n>Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - Training-free Camera Control for Video Generation [15.79168688275606]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models.<n>Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.<n>It can be plug-and-play with most pretrained video diffusion models and generate camera-controllable videos with a single image or text prompt as input.
arXiv Detail & Related papers (2024-06-14T15:33:00Z) - CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers [18.67069364925506]
We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement.
Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.
arXiv Detail & Related papers (2024-05-21T20:54:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.