Context-Aware 3D Object Localization from Single Calibrated Images: A
Study of Basketballs
- URL: http://arxiv.org/abs/2309.03640v1
- Date: Thu, 7 Sep 2023 11:14:02 GMT
- Title: Context-Aware 3D Object Localization from Single Calibrated Images: A
Study of Basketballs
- Authors: Marcello Davide Caio (1), Gabriel Van Zandycke (1 and 2) and
Christophe De Vleeschouwer (2) ((1) Sportradar AG, (2) UCLouvain)
- Abstract summary: We present a novel method for 3D basketball localization from a single calibrated image.
Our approach predicts the object's height in pixels in image space by estimating its projection onto the ground plane within the image.
The 3D coordinates of the ball are then reconstructed by exploiting the known projection matrix.
- Score: 1.809206198141384
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurately localizing objects in three dimensions (3D) is crucial for various
computer vision applications, such as robotics, autonomous driving, and
augmented reality. This task finds another important application in sports
analytics and, in this work, we present a novel method for 3D basketball
localization from a single calibrated image. Our approach predicts the object's
height in pixels in image space by estimating its projection onto the ground
plane within the image, leveraging the image itself and the object's location
as inputs. The 3D coordinates of the ball are then reconstructed by exploiting
the known projection matrix. Extensive experiments on the public DeepSport
dataset, which provides ground truth annotations for 3D ball location alongside
camera calibration information for each image, demonstrate the effectiveness of
our method, offering substantial accuracy improvements compared to recent work.
Our work opens up new possibilities for enhanced ball tracking and
understanding, advancing computer vision in diverse domains. The source code of
this work is made publicly available at
\url{https://github.com/gabriel-vanzandycke/deepsport}.
Related papers
- AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - Learning 3D Object Shape and Layout without 3D Supervision [26.575177430506667]
A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space.
We propose a method that learns to predict 3D shape and layout for objects without any ground truth shape or layout information.
Our approach outperforms supervised approaches trained on smaller and less diverse datasets.
arXiv Detail & Related papers (2022-06-14T17:49:44Z) - Ball 3D localization from a single calibrated image [1.2891210250935146]
We propose to address the task on a single image by estimating ball diameter in pixels and use the knowledge of real ball diameter in meters.
This approach is suitable for any game situation where the ball is (even partly) visible.
validations on 3 basketball datasets reveals that our model gives remarkable predictions on ball 3D localization.
arXiv Detail & Related papers (2022-03-30T19:38:14Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z) - 3D-Aware Ellipse Prediction for Object-Based Camera Pose Estimation [3.103806775802078]
We propose a method for coarse camera pose computation which is robust to viewing conditions.
It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions.
arXiv Detail & Related papers (2021-05-24T18:40:18Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z) - MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time [15.245372936153277]
MoNet3D is a novel framework that can predict the 3D position of each object in a monocular image and draw a 3D bounding box for each object.
The method can realize the real-time image processing at 27.85 FPS, showing promising potential for embedded advanced driving-assistance system applications.
arXiv Detail & Related papers (2020-06-29T12:48:57Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.