GIFT: Generated Indoor video frames for Texture-less point tracking
- URL: http://arxiv.org/abs/2503.12944v1
- Date: Mon, 17 Mar 2025 08:58:33 GMT
- Title: GIFT: Generated Indoor video frames for Texture-less point tracking
- Authors: Jianzheng Huang, Xianyu Mo, Ziling Liu, Jinyu Yang, Feng Zheng,
- Abstract summary: We introduce metrics for evaluating the texture intensity of a 3D object.<n> GIFT is a challenging synthetic benchmark comprising 1800 indoor video sequences with rich annotations.<n>Unlike existing datasets that assign ground truth points arbitrarily, GIFT precisely anchors ground truth on classified target objects.
- Score: 37.78982740178578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point tracking is becoming a powerful solver for motion estimation and video editing. Compared to classical feature matching, point tracking methods have the key advantage of robustly tracking points under complex camera motion trajectories and over extended periods. However, despite certain improvements in methodologies, current point tracking methods still struggle to track any position in video frames, especially in areas that are texture-less or weakly textured. In this work, we first introduce metrics for evaluating the texture intensity of a 3D object. Using these metrics, we classify the 3D models in ShapeNet into three levels of texture intensity and create GIFT, a challenging synthetic benchmark comprising 1800 indoor video sequences with rich annotations. Unlike existing datasets that assign ground truth points arbitrarily, GIFT precisely anchors ground truth on classified target objects, ensuring that each video corresponds to a specific texture intensity level. Furthermore, we comprehensively evaluate current methods on GIFT to assess their performance across different texture intensity levels and analyze the impact of texture on point tracking.
Related papers
- St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World [106.91539872943864]
St4RTrack is a framework that simultaneously reconstructs and tracks dynamic video content in a world coordinate frame from RGB inputs.
We predict both pointmaps at the same moment, in the same world, capturing both static and dynamic scene geometry.
We establish a new extensive benchmark for world-frame reconstruction and tracking, demonstrating the effectiveness and efficiency of our unified, data-driven framework.
arXiv Detail & Related papers (2025-04-17T17:55:58Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - TAP-Vid: A Benchmark for Tracking Any Point in a Video [84.94877216665793]
We formalize the problem of tracking arbitrary physical points on surfaces over longer video clips, naming it tracking any point (TAP)
We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks.
We propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
arXiv Detail & Related papers (2022-11-07T17:57:02Z) - Iterative Corresponding Geometry: Fusing Region and Depth for Highly
Efficient 3D Tracking of Textureless Objects [25.448657318818764]
ICG is a novel probabilistic tracker that fuses region and depth information and only requires the object geometry.
Our method deploys correspondence lines and points to iteratively refine the pose.
Experiments on the YCB-Video, OPT, and Choi datasets demonstrate that, even for textured objects, our approach outperforms the current state of the art.
arXiv Detail & Related papers (2022-03-10T12:30:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.