Related papers: SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking

SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking

URL: http://arxiv.org/abs/2602.04441v1
Date: Wed, 04 Feb 2026 11:14:21 GMT
Title: SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
Authors: Weiguang Zhao, Haoran Xu, Xingyu Miao, Qin Zhao, Rui Zhang, Kaizhu Huang, Ning Gao, Peizhou Cao, Mingze Sun, Mulin Yu, Tao Lu, Linning Xu, Junting Dong, Jiangmiao Pang,
Abstract summary: We introduce SynthVerse, a large-scale, diverse synthetic dataset specifically designed for point tracking.<n> SynthVerse substantially expands dataset diversity by covering a broader range of object categories.<n>We establish a highly diverse point tracking benchmark to systematically evaluate state-of-the-art methods.
Score: 61.01458607791313
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Point tracking aims to follow visual points through complex motion, occlusion, and viewpoint changes, and has advanced rapidly with modern foundation models. Yet progress toward general point tracking remains constrained by limited high-quality data, as existing datasets often provide insufficient diversity and imperfect trajectory annotations. To this end, we introduce SynthVerse, a large-scale, diverse synthetic dataset specifically designed for point tracking. SynthVerse includes several new domains and object types missing from existing synthetic datasets, such as animated-film-style content, embodied manipulation, scene navigation, and articulated objects. SynthVerse substantially expands dataset diversity by covering a broader range of object categories and providing high-quality dynamic motions and interactions, enabling more robust training and evaluation for general point tracking. In addition, we establish a highly diverse point tracking benchmark to systematically evaluate state-of-the-art methods under broader domain shifts. Extensive experiments and analyses demonstrate that training with SynthVerse yields consistent improvements in generalization and reveal limitations of existing trackers under diverse settings.

Related papers

Is This Tracker On? A Benchmark Protocol for Dynamic Tracking [6.23176842962524]
ITTO is a new benchmark suite for evaluating and diagnosing the capabilities and limitations of point tracking methods.<n>We conduct a rigorous analysis of state-of-the-art tracking methods on ITTO, breaking down performance along key axes of motion complexity.
arXiv Detail & Related papers (2025-10-22T17:53:56Z)
A Deep Dive into Generic Object Tracking: A Survey [3.7305040207339286]
Object tracking remains an important yet challenging task in computer vision due to complextemporal dynamics.<n>Siamese-based trackers, discriminative trackers, and transformer-based approaches have been introduced to address these challenges.
arXiv Detail & Related papers (2025-07-31T05:19:26Z)
Unified People Tracking with Graph Neural Networks [39.22185669123208]
We present a unified, fully differentiable model for multi-people tracking that learns to associate detections into trajectories.<n>The model builds a dynamic graph that aggregates spatial, contextual, and temporal information.<n>We also introduce a new scale dataset with 25 partially overlapping views, detailed scene reconstructions, and extensive occlusions.
arXiv Detail & Related papers (2025-07-11T11:17:25Z)
MTGS: Multi-Traversal Gaussian Splatting [51.22657444433942]
Multi-traversal data provides multiple viewpoints for scene reconstruction within a road block.<n>We propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data.<n>Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines.
arXiv Detail & Related papers (2025-03-16T15:46:12Z)
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB [40.62577054196799]
We introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions.<n>We present a benchmarking framework for a comprehensive comparison of pose tracking algorithms.
arXiv Detail & Related papers (2024-10-09T09:01:40Z)
SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection [20.985372561774415]
We present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories. We will release SyntheWorld to facilitate remote sensing image processing research.
arXiv Detail & Related papers (2023-09-05T02:42:41Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis. The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper. We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR. Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking. Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.