Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation
- URL: http://arxiv.org/abs/2511.20250v1
- Date: Tue, 25 Nov 2025 12:25:20 GMT
- Title: Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation
- Authors: Daniel Kienzle, Katja Ludwig, Julian Lorenz, Shin'ichi Satoh, Rainer Lienhart,
- Abstract summary: We propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task.<n>By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application.
- Score: 32.884975524869084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining the precise 3D motion of a table tennis ball from standard monocular videos is a challenging problem, as existing methods trained on synthetic data struggle to generalize to the noisy, imperfect ball and table detections of the real world. This is primarily due to the inherent lack of 3D ground truth trajectories and spin annotations for real-world video. To overcome this, we propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task. This separation allows us to train the front-end components with abundant 2D supervision from our newly created TTHQ dataset, while the back-end uplifting network is trained exclusively on physically-correct synthetic data. We specifically re-engineer the uplifting model to be robust to common real-world artifacts, such as missing detections and varying frame rates. By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application for 3D table tennis trajectory and spin analysis.
Related papers
- 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection [62.57179069154312]
We introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD)<n>We lift the open-set 2D detection into 3D space through our designed 3D bounding box head.<n>We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes.
arXiv Detail & Related papers (2025-07-31T13:56:41Z) - Where Is The Ball: 3D Ball Trajectory Estimation From 2D Monocular Tracking [10.237629959021875]
We present a method for 3D ball trajectory estimation from a 2D tracking sequence.<n>Our method achieves state-of-the-art performance despite training solely on simulated data.<n>Our method can generalize to real-world scenarios with multiple trajectories, opening up a range of applications in sport analysis and virtual replay.
arXiv Detail & Related papers (2025-06-06T05:42:05Z) - TT3D: Table Tennis 3D Reconstruction [11.84899291358663]
We propose a novel approach for reconstructing precise 3D ball trajectories from online table tennis match recordings.<n>Our method leverages the underlying physics of the ball's motion to identify the bounce state that minimizes the reprojection error of the ball's flying trajectory.<n>A key advantage of our approach is its ability to infer ball spin without relying on human pose estimation or racket tracking.
arXiv Detail & Related papers (2025-04-14T09:37:47Z) - xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion [4.878192303432336]
DIOD-3D is the first baseline for multi-object discovery in 3D data using 2D motion.<n>xMOD is a cross-modal training framework that integrates 2D and 3D data while always using 2D motion cues.<n>Our approach yields a substantial performance improvement compared with the 2D object discovery state-of-the-art on all datasets.
arXiv Detail & Related papers (2025-03-19T09:20:35Z) - Mocap-2-to-3: Multi-view Lifting for Monocular Motion Recovery with 2D Pretraining [49.223455189395025]
Mocap-2-to-3 is a novel framework that performs multi-view lifting from monocular input.<n>To leverage abundant 2D data, we decompose complex 3D motion into multi-view syntheses.<n>Our method surpasses state-of-the-art approaches in both camera-space motion realism and world-grounded human positioning.
arXiv Detail & Related papers (2025-03-05T06:32:49Z) - Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation [30.744137117668643]
Lift3D is a framework that enhances 2D foundation models with implicit and explicit 3D robotic representations to construct a robust 3D manipulation policy.<n>In experiments, Lift3D consistently outperforms previous state-of-the-art methods across several simulation benchmarks and real-world scenarios.
arXiv Detail & Related papers (2024-11-27T18:59:52Z) - Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text [61.9973218744157]
We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories.
Experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.
arXiv Detail & Related papers (2024-06-25T14:42:51Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - 3D-LFM: Lifting Foundation Model [29.48835001900286]
deep learning has expanded our capability to reconstruct a wide range of object classes.
Our approach harnesses the inherent permutation equivariance transformers to manage varying number points per 3D data instance.
We demonstrate state the art performance across 2D-3D lifting task benchmarks.
arXiv Detail & Related papers (2023-12-19T06:38:18Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [111.16358607889609]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.<n>For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.