Related papers: Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment

Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment

URL: http://arxiv.org/abs/2506.14271v1
Date: Tue, 17 Jun 2025 07:37:08 GMT
Title: Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment
Authors: Weiming Zhang, Dingwen Xiao, Aobotao Dai, Yexin Liu, Tianbo Pan, Shiqi Wen, Lei Chen, Lin Wang,
Abstract summary: Leader360V is the first large-scale, labeled real-world 360 video datasets for instance segmentation and tracking.<n>Our datasets enjoy high scene diversity, ranging from indoor and urban settings to natural and dynamic outdoor scenes.<n>Experiments confirm that Leader360V significantly enhances model performance for 360 video segmentation and tracking.
Score: 19.70383859926191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 360 video captures the complete surrounding scenes with the ultra-large field of view of 360X180. This makes 360 scene understanding tasks, eg, segmentation and tracking, crucial for appications, such as autonomous driving, robotics. With the recent emergence of foundation models, the community is, however, impeded by the lack of large-scale, labelled real-world datasets. This is caused by the inherent spherical properties, eg, severe distortion in polar regions, and content discontinuities, rendering the annotation costly yet complex. This paper introduces Leader360V, the first large-scale, labeled real-world 360 video datasets for instance segmentation and tracking. Our datasets enjoy high scene diversity, ranging from indoor and urban settings to natural and dynamic outdoor scenes. To automate annotation, we design an automatic labeling pipeline, which subtly coordinates pre-trained 2D segmentors and large language models to facilitate the labeling. The pipeline operates in three novel stages. Specifically, in the Initial Annotation Phase, we introduce a Semantic- and Distortion-aware Refinement module, which combines object mask proposals from multiple 2D segmentors with LLM-verified semantic labels. These are then converted into mask prompts to guide SAM2 in generating distortion-aware masks for subsequent frames. In the Auto-Refine Annotation Phase, missing or incomplete regions are corrected either by applying the SDR again or resolving the discontinuities near the horizontal borders. The Manual Revision Phase finally incorporates LLMs and human annotators to further refine and validate the annotations. Extensive user studies and evaluations demonstrate the effectiveness of our labeling pipeline. Meanwhile, experiments confirm that Leader360V significantly enhances model performance for 360 video segmentation and tracking, paving the way for more scalable 360 scene understanding.

Related papers

SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z)
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting [15.177483700681377]
Three-dimensional scene inpainting is crucial for applications from virtual reality to architectural visualization.<n>We present AuraFusion360, a novel reference-based method that enables high-quality object removal and hole filling in 3D scenes represented by Gaussian Splatting.<n>We also introduce 360-USID, the first comprehensive dataset for 360deg scene inpainting with ground truth.
arXiv Detail & Related papers (2025-02-07T18:59:55Z)
T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system.<n>It streamlines video generation by using text prompts to create reference videos.<n>These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z)
Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation [83.841877607646]
We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation.<n>The dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images.<n>We benchmark leading stereo depth estimation models for both standard and omnidirectional images.
arXiv Detail & Related papers (2024-11-27T13:34:41Z)
3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding. This paper introduces a novel approach to instance segmentation and tracking in first-person video. By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z)
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene. Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output. We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z)
Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM) We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z)
360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos [16.372814014632944]
We propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS) 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. We benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset.
arXiv Detail & Related papers (2024-04-22T07:54:53Z)
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes [53.60876822010642]
We present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate high-quality panoptic labels.<n>Our experiments demonstrate PanopticNeRF-360's state-of-the-art performance over label transfer methods on the KITTI-360 dataset.
arXiv Detail & Related papers (2023-09-19T17:54:22Z)
Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation. We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
A Fixation-based 360{\deg} Benchmark Dataset for Salient Object Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.