DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
- URL: http://arxiv.org/abs/2411.13291v1
- Date: Wed, 20 Nov 2024 13:01:16 GMT
- Title: DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild
- Authors: Weicai Ye, Xinyu Chen, Ruohao Zhan, Di Huang, Xiaoshui Huang, Haoyi Zhu, Hujun Bao, Wanli Ouyang, Tong He, Guofeng Zhang,
- Abstract summary: This paper proposes a concise, elegant, and robust pipeline to estimate smooth camera trajectories and obtain dense point clouds for casual videos in the wild.
We show that the proposed method achieves state-of-the-art performance in terms of camera pose estimation even in complex dynamic challenge scenes.
- Score: 85.03973683867797
- License:
- Abstract: This paper proposes a concise, elegant, and robust pipeline to estimate smooth camera trajectories and obtain dense point clouds for casual videos in the wild. Traditional frameworks, such as ParticleSfM~\cite{zhao2022particlesfm}, address this problem by sequentially computing the optical flow between adjacent frames to obtain point trajectories. They then remove dynamic trajectories through motion segmentation and perform global bundle adjustment. However, the process of estimating optical flow between two adjacent frames and chaining the matches can introduce cumulative errors. Additionally, motion segmentation combined with single-view depth estimation often faces challenges related to scale ambiguity. To tackle these challenges, we propose a dynamic-aware tracking any point (DATAP) method that leverages consistent video depth and point tracking. Specifically, our DATAP addresses these issues by estimating dense point tracking across the video sequence and predicting the visibility and dynamics of each point. By incorporating the consistent video depth prior, the performance of motion segmentation is enhanced. With the integration of DATAP, it becomes possible to estimate and optimize all camera poses simultaneously by performing global bundle adjustments for point tracking classified as static and visible, rather than relying on incremental camera registration. Extensive experiments on dynamic sequences, e.g., Sintel and TUM RGBD dynamic sequences, and on the wild video, e.g., DAVIS, demonstrate that the proposed method achieves state-of-the-art performance in terms of camera pose estimation even in complex dynamic challenge scenes.
Related papers
- MONA: Moving Object Detection from Videos Shot by Dynamic Camera [20.190677328673836]
We introduce MONA, a framework for robust moving object detection and segmentation from videos shot by dynamic cameras.
MonA comprises two key modules: Dynamic Points Extraction, which leverages optical flow and tracking any point to identify dynamic points, and Moving Object, which employs adaptive bounding box filtering.
We validate MONA by integrating with the camera trajectory estimation method LEAP-VO, and it achieves state-of-the-art results on the MPI Sintel dataset.
arXiv Detail & Related papers (2025-01-22T19:30:28Z) - Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency [58.719310295870024]
This paper presents an event-based framework for tracking any point.
It tackles the challenges posed by spatial sparsity and motion sensitivity in events.
It achieves 150% faster processing with competitive model parameters.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras [33.81592783496106]
Event-based visual odometry aims at solving tracking and mapping subproblems (typically in parallel)
We build an event-based stereo visual-inertial odometry system on top of a direct pipeline.
The resulting system scales well with modern high-resolution event cameras.
arXiv Detail & Related papers (2024-10-12T05:35:27Z) - Motion Segmentation for Neuromorphic Aerial Surveillance [42.04157319642197]
Event cameras offer superior temporal resolution, superior dynamic range, and minimal power requirements.
Unlike traditional frame-based sensors that capture redundant information at fixed intervals, event cameras asynchronously record pixel-level brightness changes.
We introduce a novel motion segmentation method that leverages self-supervised vision transformers on both event data and optical flow information.
arXiv Detail & Related papers (2024-05-24T04:36:13Z) - Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023 [50.910598799408326]
The Tracking Any Point (TAP) task tracks any physical surface through a video.
Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories.
We propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.
arXiv Detail & Related papers (2024-03-26T13:50:39Z) - DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)
DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.
We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.