Related papers: Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision

Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision

URL: http://arxiv.org/abs/2508.09681v1
Date: Wed, 13 Aug 2025 10:20:24 GMT
Title: Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision
Authors: Gerardo Loza, Junlei Hu, Dominic Jones, Sharib Ali, Pietro Valdastri,
Abstract summary: InvNeRF architecture is used to perform both 2D and 3D tracking in surgical scenarios.<n>In 2D point tracking, our approach surpasses the precision and accuracy of the TTO state-of-the-art methods by nearly 50% on average precision.<n>In 3D point tracking, this is the first TTO approach, surpassing feed-forward methods while incorporating the benefits of a deformable NeRF-based reconstruction.
Score: 3.7908123458994107
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We proposed a novel test-time optimisation (TTO) approach framed by a NeRF-based architecture for long-term 3D point tracking. Most current methods in point tracking struggle to obtain consistent motion or are limited to 2D motion. TTO approaches frame the solution for long-term tracking as optimising a function that aggregates correspondences from other specialised state-of-the-art methods. Unlike the state-of-the-art on TTO, we propose parametrising such a function with our new invertible Neural Radiance Field (InvNeRF) architecture to perform both 2D and 3D tracking in surgical scenarios. Our approach allows us to exploit the advantages of a rendering-based approach by supervising the reprojection of pixel correspondences. It adapts strategies from recent rendering-based methods to obtain a bidirectional deformable-canonical mapping, to efficiently handle a defined workspace, and to guide the rays' density. It also presents our multi-scale HexPlanes for fast inference and a new algorithm for efficient pixel sampling and convergence criteria. We present results in the STIR and SCARE datasets, for evaluating point tracking and testing the integration of kinematic data in our pipeline, respectively. In 2D point tracking, our approach surpasses the precision and accuracy of the TTO state-of-the-art methods by nearly 50% on average precision, while competing with other approaches. In 3D point tracking, this is the first TTO approach, surpassing feed-forward methods while incorporating the benefits of a deformable NeRF-based reconstruction.

Related papers

Robust Mesh Saliency GT Acquisition in VR via View Cone Sampling and Geometric Smoothing [59.12032628787018]
3D mesh saliency ground truth is essential for human-centric visual modeling in virtual reality (VR)<n>Current VR eye-tracking pipelines rely on single ray sampling and Euclidean smoothing, triggering texture attention and signal leakage across gaps.<n>This paper proposes a robust framework to address these limitations.
arXiv Detail & Related papers (2026-01-06T05:20:12Z)
DELTAv2: Accelerating Dense 3D Tracking [79.63990337419514]
We propose a novel algorithm for accelerating dense long-term 3D point tracking in videos.<n>We introduce a coarse-to-fine strategy that begins tracking with a small subset of points and progressively expands the set of tracked trajectories.<n>The newly added trajectories are using a learnable module, which is trained end-to-end alongside the tracking network.
arXiv Detail & Related papers (2025-08-02T03:15:47Z)
DELTA: Dense Efficient Long-range 3D Tracking for any video [82.26753323263009]
We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos.<n>Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions.<n>Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.
arXiv Detail & Related papers (2024-10-31T17:59:01Z)
OriCon3D: Effective 3D Object Detection using Orientation and Confidence [0.0]
We propose an advanced methodology for the detection of 3D objects from a single image. We use a deep convolutional neural network-based 3D object weighted orientation regression paradigm. Our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies.
arXiv Detail & Related papers (2023-04-27T19:52:47Z)
TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads. Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view. We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z)
Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream. At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank. To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z)
3D Multi-Object Tracking with Differentiable Pose Estimation [0.0]
We propose a novel approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments. We leverage those correspondences to inform a graph neural network to solve for the optimal, temporally-consistent 7-DoF pose trajectories of all objects. Our method improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods.
arXiv Detail & Related papers (2022-06-28T06:46:32Z)
Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans. Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art. LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z)
SRT3D: A Sparse Region-Based 3D Object Tracking Approach for the Real World [10.029003607782878]
Region-based methods have become increasingly popular for model-based, monocular 3D tracking of texture-less objects in cluttered scenes. However, most methods are computationally expensive, requiring significant resources to run in real-time. We develop SRT3D, a sparse region-based approach to 3D object tracking that bridges this gap in efficiency.
arXiv Detail & Related papers (2021-10-25T07:58:18Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method. From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box. Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.