Related papers: SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding

SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding

URL: http://arxiv.org/abs/2504.10106v1
Date: Mon, 14 Apr 2025 11:15:13 GMT
Title: SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding
Authors: Marc Gutiérrez-Pérez, Antonio Agudo,
Abstract summary: We introduce SoccerNet-v3D and ISSIA-3D, two datasets designed for 3D scene understanding in soccer broadcast analysis.<n>These datasets extend SoccerNet-v3 and ISSIA by incorporating field-line-based camera calibration and multi-view synchronization.<n>We propose a monocular 3D ball localization task built upon the triangulation of ground-truth 2D ball annotations.
Score: 16.278222277579655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sports video analysis is a key domain in computer vision, enabling detailed spatial understanding through multi-view correspondences. In this work, we introduce SoccerNet-v3D and ISSIA-3D, two enhanced and scalable datasets designed for 3D scene understanding in soccer broadcast analysis. These datasets extend SoccerNet-v3 and ISSIA by incorporating field-line-based camera calibration and multi-view synchronization, enabling 3D object localization through triangulation. We propose a monocular 3D ball localization task built upon the triangulation of ground-truth 2D ball annotations, along with several calibration and reprojection metrics to assess annotation quality on demand. Additionally, we present a single-image 3D ball localization method as a baseline, leveraging camera calibration and ball size priors to estimate the ball's position from a monocular viewpoint. To further refine 2D annotations, we introduce a bounding box optimization technique that ensures alignment with the 3D scene representation. Our proposed datasets establish new benchmarks for 3D soccer scene understanding, enhancing both spatial and temporal analysis in sports analytics. Finally, we provide code to facilitate access to our annotations and the generation pipelines for the datasets.

Related papers

Where Is The Ball: 3D Ball Trajectory Estimation From 2D Monocular Tracking [10.237629959021875]
We present a method for 3D ball trajectory estimation from a 2D tracking sequence.<n>Our method achieves state-of-the-art performance despite training solely on simulated data.<n>Our method can generalize to real-world scenarios with multiple trajectories, opening up a range of applications in sport analysis and virtual replay.
arXiv Detail & Related papers (2025-06-06T05:42:05Z)
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI.<n>We introduce Articulate3D, an expertly curated 3D dataset featuring high-quality manual annotations on 280 indoor scenes.<n>We also present USDNet, a novel unified framework capable of simultaneously predicting part segmentation along with a full specification of motion attributes for articulated objects.
arXiv Detail & Related papers (2024-12-02T11:33:55Z)
CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction [2.0375637582248136]
Cross-View Center Point-Fusion is a state-of-the-art model to perform 3D object detection. Our architecture utilizes aspects from previously established algorithms, Cross-View Transformers and CenterPoint.
arXiv Detail & Related papers (2024-10-15T02:55:07Z)
TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D. This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z)
AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements [0.8009842832476994]
We introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to validate our knowledge. We also present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, aimed a non-linear approach for embedding pose sequences. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond annotated data.
arXiv Detail & Related papers (2024-05-20T14:40:26Z)
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images. We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image. We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z)
Context-Aware 3D Object Localization from Single Calibrated Images: A Study of Basketballs [1.809206198141384]
We present a novel method for 3D basketball localization from a single calibrated image. Our approach predicts the object's height in pixels in image space by estimating its projection onto the ground plane within the image. The 3D coordinates of the ball are then reconstructed by exploiting the known projection matrix.
arXiv Detail & Related papers (2023-09-07T11:14:02Z)
AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z)
Generating Visual Spatial Description via Holistic 3D Scene Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. With an external 3D scene extractor, we obtain the 3D objects and scene features for input images. We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z)
CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework. Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene. In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
Ball 3D localization from a single calibrated image [1.2891210250935146]
We propose to address the task on a single image by estimating ball diameter in pixels and use the knowledge of real ball diameter in meters. This approach is suitable for any game situation where the ball is (even partly) visible. validations on 3 basketball datasets reveals that our model gives remarkable predictions on ball 3D localization.
arXiv Detail & Related papers (2022-03-30T19:38:14Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.