Dynamic Camera Poses and Where to Find Them
- URL: http://arxiv.org/abs/2504.17788v1
- Date: Thu, 24 Apr 2025 17:59:56 GMT
- Title: Dynamic Camera Poses and Where to Find Them
- Authors: Chris Rockwell, Joseph Tung, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin,
- Abstract summary: We introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses.<n>For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion.<n>Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes.
- Score: 36.249380390918816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Annotating camera poses on dynamic Internet videos at scale is critical for advancing fields like realistic video generation and simulation. However, collecting such a dataset is difficult, as most Internet videos are unsuitable for pose estimation. Furthermore, annotating dynamic Internet videos present significant challenges even for state-of-theart methods. In this paper, we introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses. Our collection pipeline addresses filtering using a carefully combined set of task-specific and generalist models. For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion to achieve improvements over the state-of-the-art approaches. Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes, opening up avenues for advancements in various downstream applications.
Related papers
- RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements [9.714839452308581]
RealCam-Vid is an open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations.<n>This paper introduces the first fully open-source, high-resolution dynamic-scene dataset with metric-scale camera annotations.
arXiv Detail & Related papers (2025-04-11T02:35:19Z) - AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [52.726585508669686]
We propose AnyCam, a fast transformer model that directly estimates camera poses and intrinsics from a dynamic video sequence.<n>We test AnyCam on established datasets, where it delivers accurate camera poses and intrinsics both qualitatively and quantitatively.<n>By combining camera information, uncertainty, and depth, our model can produce high-quality 4D pointclouds.
arXiv Detail & Related papers (2025-03-30T02:22:11Z) - FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.<n>Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.<n>We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios [20.835782699441797]
This paper presents a novel dataset DynOPETs for object pose estimation and tracking in unconstrained environments.<n>Our efficient annotation method innovatively integrates pose estimation and pose tracking techniques to generate pseudo-labels.<n>The resulting dataset offers accurate pose annotations for dynamic objects observed from moving cameras.
arXiv Detail & Related papers (2025-03-25T13:13:44Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.