Geometry OR Tracker: Universal Geometric Operating Room Tracking
- URL: http://arxiv.org/abs/2603.00560v1
- Date: Sat, 28 Feb 2026 09:21:21 GMT
- Title: Geometry OR Tracker: Universal Geometric Operating Room Tracking
- Authors: Yihua Shao, Kang Chen, Feng Xue, Siyu Chen, Long Bai, Hongyuan Yu, Hao Tang, Jinlin Wu, Nassir Navab,
- Abstract summary: In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition.<n>Camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency.<n>We introduce Geometry OR Tracker, a two-stage pipeline that rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup.
- Score: 61.399734016038614
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition, where physically meaningful quantities such as distances and motion statistics must be measured in meters. However, real clinical deployments rarely satisfy the geometric prerequisites for stable multi-view fusion and tracking: camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency that produces "ghosting" during fusion and degrades 3D trajectories in a shared OR coordinate frame. To address this, we introduce Geometry OR Tracker, a two-stage pipeline that first rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup with a single global scale via a Multi-view Metric Geometry Rectification module, and then performs Occlusion-Robust 3D Point Tracking directly in the unified OR world frame. On the MM-OR benchmark, improved geometric consistency translates into tracking gains: our rectification front-end reduces cross-view depth disagreement by more than 30$\times$ compared to raw calibration. Ablation studies further demonstrate the relationship between calibration quality and tracking accuracy, showing that improved geometric consistency yields stronger world-frame tracking.
Related papers
- Stereo-Inertial Poser: Towards Metric-Accurate Shape-Aware Motion Capture Using Sparse IMUs and a Single Stereo Camera [54.967647497048205]
We present Stereo-Inertial Poser, a real-time motion capture system that estimates metric-accurate and shape-aware 3D human motion.<n>We replace the monocular RGB with stereo vision, enabling direct 3D keypoint extraction and body shape parameter estimation.<n>Our method produces drift-free global translation under a long recording time and reduces foot-skating effects.
arXiv Detail & Related papers (2026-03-02T17:46:38Z) - Geometry-Aware Rotary Position Embedding for Consistent Video World Model [48.914346802616414]
ViewRope is a geometry-aware encoding that injects camera-ray directions directly into video transformer self-attention layers.<n>Geometry-Aware Frame-Sparse Attention exploits these geometric cues to selectively attend to relevant historical frames.<n>Our results demonstrate that ViewRope substantially improves long-term consistency while reducing computational costs.
arXiv Detail & Related papers (2026-02-08T08:01:16Z) - TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction [57.46712611558817]
3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass.<n>Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry.<n>We propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies.
arXiv Detail & Related papers (2025-12-02T02:22:20Z) - SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection [49.12928389918159]
Existing monocular 3D detectors typically tame the pronounced nonlinear regression of 3D bounding box through decoupled prediction paradigm.<n>We propose novel Spatial-Projection Alignment (SPAN) with two pivotal components.<n>SPAN enforces an explicit global spatial constraint between the predicted and ground-truth 3D bounding boxes, thereby rectifying spatial drift caused by decoupled attribute regression.<n>3D-2D Projection Alignment ensures that the projected 3D box is aligned tightly within its corresponding 2D detection bounding box on the image plane, mitigating projection misalignment overlooked in previous works.
arXiv Detail & Related papers (2025-11-10T04:48:48Z) - On Accurate and Robust Estimation of 3D and 2D Circular Center: Method and Application to Camera-Lidar Calibration [6.9583467338943485]
We propose a robust 3D circle center estimator based on conformal geometric algebra and RANSAC.<n>We also propose a chord-length variance minimization method to recover the true 2D projected center.<n>Our framework significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-11-10T01:43:42Z) - Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z) - TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints [22.121665995381324]
TVG-SLAM is a robust RGB-only 3DGS SLAM system that leverages a novel tri-view geometry paradigm to ensure consistent tracking and high-quality mapping.<n>Our method improves tracking robustness, reducing the average Absolute Trajectory Error (ATE) by 69.0% while achieving state-of-the-art rendering quality.
arXiv Detail & Related papers (2025-06-29T12:31:05Z) - S.T.A.R.-Track: Latent Motion Models for End-to-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations [10.46571824050325]
Following the tracking-by-attention paradigm, this paper introduces an object-centric, transformer-based framework for tracking in 3D.
Inspired by this, we propose S.T.A.R.-Track, which uses a novel latent motion model (LMM) to adjust object queries to account for changes in viewing direction and lighting conditions directly in the latent space.
arXiv Detail & Related papers (2023-06-30T12:22:41Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.