Related papers: Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

URL: http://arxiv.org/abs/2504.14899v2
Date: Sat, 20 Sep 2025 05:38:05 GMT
Title: Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Authors: Chenjie Cao, Jingkai Zhou, Shikai Li, Jingyun Liang, Chaohui Yu, Fan Wang, Xiangyang Xue, Yanwei Fu,
Abstract summary: We present Uni3C, a unified framework for precise control of both camera and human motion in video generation.<n>First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController.<n>Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters.
Score: 73.73984727616198
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters to unify the control signals for camera and human motion, respectively. Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation. Uni3C substantially outperforms competitors in both camera controllability and human motion quality. Additionally, we collect tailored validation sets featuring challenging camera movements and human actions to validate the effectiveness of our method.

Related papers

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation [29.389246008057473]
2D poses rigidly bind motion to the driving viewpoint, precluding novel-view synthesis.<n>3DiMo trains a motion encoder with a pretrained video generator to distill driving frames into compact, view-agnostic motion tokens.<n>Experiments confirm that 3DiMo faithfully reproduces driving motions with flexible, text-driven camera control.
arXiv Detail & Related papers (2026-02-03T17:59:09Z)
Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation [21.084121261693365]
We propose DepthDirector, a video re-rendering framework with precise camera controllability.<n>By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories.
arXiv Detail & Related papers (2026-01-15T09:26:45Z)
Unified Camera Positional Encoding for Controlled Video Generation [48.5789182990001]
Transformers have emerged as a universal backbone across 3D perception, video generation, and world models for autonomous driving and embodied AI.<n>We introduce Relative Ray, a geometry-consistent representation that unifies complete camera information, including 6-DoF poses, intrinsics, and lens distortions.<n>To facilitate systematic training and evaluation, we construct a large video dataset covering a wide range of camera motions and lens types.
arXiv Detail & Related papers (2025-12-08T07:34:01Z)
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance [69.40274699401473]
We introduce EPiC, an efficient and precise camera control learning framework.<n>It constructs high-quality anchor videos without expensive camera trajectory annotations.<n>EPiC achieves SOTA performance on RealEstate10K and MiraData for I2V camera control task.
arXiv Detail & Related papers (2025-05-28T01:45:26Z)
Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation [78.65431951506152]
We introduce a Synthetic dataset for Free-Form Motion Control (SynFMC)<n>The proposed SynFMC dataset includes diverse object and environment categories.<n>It covers various motion patterns according to specific rules, simulating common and complex real-world scenarios.<n>The complete 6D pose information facilitates models learning to disentangle the motion effects from objects and the camera in a video.
arXiv Detail & Related papers (2025-01-02T18:59:45Z)
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation [83.98251722144195]
Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions. We introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space. We show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions.
arXiv Detail & Related papers (2024-12-10T18:55:13Z)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.<n>We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z)
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength [11.778832811404259]
I2VControl-Camera is a novel camera control method that significantly enhances controllability while providing over the strength of subject motion. To accurately control and adjust the strength of subject motion, we explicitly model the higher-order components of the video trajectory expansion.
arXiv Detail & Related papers (2024-11-10T16:59:39Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We show how to tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism.<n>Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Training-free Camera Control for Video Generation [15.79168688275606]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. It can be plug-and-play with most pretrained video diffusion models and generate camera-controllable videos with a single image or text prompt as input.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation [77.09621778348733]
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
arXiv Detail & Related papers (2023-12-06T17:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.