Learning Camera Movement Control from Real-World Drone Videos
- URL: http://arxiv.org/abs/2412.09620v1
- Date: Thu, 12 Dec 2024 18:59:54 GMT
- Title: Learning Camera Movement Control from Real-World Drone Videos
- Authors: Yunzhong Hou, Liang Zheng, Philip Torr,
- Abstract summary: Existing AI videography methods struggle with limited appearance diversity in simulation training.
We propose a scalable method that involves collecting real-world training data.
We show that our system effectively learns to perform challenging camera movements.
- Score: 25.10006841389459
- License:
- Abstract: This study seeks to automate camera movement control for filming existing subjects into attractive videos, contrasting with the creation of non-existent content by directly generating the pixels. We select drone videos as our test case due to their rich and challenging motion patterns, distinctive viewing angles, and precise controls. Existing AI videography methods struggle with limited appearance diversity in simulation training, high costs of recording expert operations, and difficulties in designing heuristic-based goals to cover all scenarios. To avoid these issues, we propose a scalable method that involves collecting real-world training data to improve diversity, extracting camera trajectories automatically to minimize annotation costs, and training an effective architecture that does not rely on heuristics. Specifically, we collect 99k high-quality trajectories by running 3D reconstruction on online videos, connecting camera poses from consecutive frames to formulate 3D camera paths, and using Kalman filter to identify and remove low-quality data. Moreover, we introduce DVGFormer, an auto-regressive transformer that leverages the camera path and images from all past frames to predict camera movement in the next frame. We evaluate our system across 38 synthetic natural scenes and 7 real city 3D scans. We show that our system effectively learns to perform challenging camera movements such as navigating through obstacles, maintaining low altitude to increase perceived speed, and orbiting towers and buildings, which are very useful for recording high-quality videos. Data and code are available at dvgformer.github.io.
Related papers
- CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation [76.72787726497343]
We present CineMaster, a framework for 3D-aware and controllable text-to-video generation.
Our goal is to empower users with comparable controllability as professional film directors.
arXiv Detail & Related papers (2025-02-12T18:55:36Z) - Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos [76.07894127235058]
We present a system for mining high-quality 4D reconstructions from internet stereoscopic, wide-angle videos.
We use this method to generate large-scale data in the form of world-consistent, pseudo-metric 3D point clouds.
We demonstrate the utility of this data by training a variant of DUSt3R to predict structure and 3D motion from real-world image pairs.
arXiv Detail & Related papers (2024-12-12T18:59:54Z) - MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [104.1338295060383]
We present a system that allows for accurate, fast, and robust estimation of camera parameters and depth maps from casual monocular videos of dynamic scenes.
Our system is significantly more accurate and robust at camera pose and depth estimation when compared with prior and concurrent work.
arXiv Detail & Related papers (2024-12-05T18:59:42Z) - AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.
We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z) - ChatCam: Empowering Camera Control through Conversational AI [67.31920821192323]
ChatCam is a system that navigates camera movements through conversations with users.
To achieve this, we propose CineGPT, a GPT-based autoregressive model for text-conditioned camera trajectory generation.
We also develop an Anchor Determinator to ensure precise camera trajectory placement.
arXiv Detail & Related papers (2024-09-25T20:13:41Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - 3D Human Reconstruction in the Wild with Collaborative Aerial Cameras [3.3674370488883434]
We present a real-time aerial system for multi-camera control that can reconstruct human motions in natural environments without the use of special-purpose markers.
We develop a multi-robot coordination scheme that maintains the optimal flight formation for target reconstruction quality amongst obstacles.
arXiv Detail & Related papers (2021-08-09T11:03:38Z) - Reconstruction of 3D flight trajectories from ad-hoc camera networks [19.96488566402593]
We present a method to reconstruct the 3D trajectory of an airborne robotic system only from videos recorded with cameras that are unsynchronized.
Our approach enables robust and accurate outside-in tracking of dynamically flying targets, with cheap and easy-to-deploy equipment.
arXiv Detail & Related papers (2020-03-10T14:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.