From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans
- URL: http://arxiv.org/abs/2503.07933v2
- Date: Thu, 13 Mar 2025 00:01:12 GMT
- Title: From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans
- Authors: Qinji Yu, Yirui Wang, Ke Yan, Dandan Zheng, Dashan Ai, Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Yun Bian, Na Shen, Xiaowei Ding, Le Lu, Xianghua Ye, Dakai Jin,
- Abstract summary: LN-Tracker is a novel LN tracking transformer for joint end-to-end detection and 3D instance association.<n>LN-Tracker decouples transformer decoder's query into the track and detection groups, where the track query autoregressively follows previously tracked LN instances.<n>Extensive evaluation on four lymph node datasets shows LN-Tracker's superior performance, with at least 2.7% gain in average sensitivity when compared to other top 3D/2.5D detectors.
- Score: 9.00859316589341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lymph node (LN) assessment is an essential task in the routine radiology workflow, providing valuable insights for cancer staging, treatment planning and beyond. Identifying scatteredly-distributed and low-contrast LNs in 3D CT scans is highly challenging, even for experienced clinicians. Previous lesion and LN detection methods demonstrate effectiveness of 2.5D approaches (i.e, using 2D network with multi-slice inputs), leveraging pretrained 2D model weights and showing improved accuracy as compared to separate 2D or 3D detectors. However, slice-based 2.5D detectors do not explicitly model inter-slice consistency for LN as a 3D object, requiring heuristic post-merging steps to generate final 3D LN instances, which can involve tuning a set of parameters for each dataset. In this work, we formulate 3D LN detection as a tracking task and propose LN-Tracker, a novel LN tracking transformer, for joint end-to-end detection and 3D instance association. Built upon DETR-based detector, LN-Tracker decouples transformer decoder's query into the track and detection groups, where the track query autoregressively follows previously tracked LN instances along the z-axis of a CT scan. We design a new transformer decoder with masked attention module to align track query's content to the context of current slice, meanwhile preserving detection query's high accuracy in current slice. An inter-slice similarity loss is introduced to encourage cohesive LN association between slices. Extensive evaluation on four lymph node datasets shows LN-Tracker's superior performance, with at least 2.7% gain in average sensitivity when compared to other top 3D/2.5D detectors. Further validation on public lung nodule and prostate tumor detection tasks confirms the generalizability of LN-Tracker as it achieves top performance on both tasks.
Related papers
- DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer [9.693078559346688]
We propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance.
By enhancing the 2D backbone with a multi-scale 2.5D feature fusion, we make two main contributions to improve the representation quality of LN queries.
Our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing.
arXiv Detail & Related papers (2024-04-04T22:31:15Z) - KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection [48.66703222700795]
We resort to a novel kernel strategy to identify the most informative point clouds to acquire labels.
To accommodate both one-stage (i.e., SECOND) and two-stage detectors, we incorporate the classification entropy tangent and well trade-off between detection performance and the total number of bounding boxes selected for annotation.
Our results show that approximately 44% box-level annotation costs and 26% computational time are reduced compared to the state-of-the-art method.
arXiv Detail & Related papers (2023-07-16T04:27:03Z) - Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems.
Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN.
We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - Multi-Slice Dense-Sparse Learning for Efficient Liver and Tumor
Segmentation [4.150096314396549]
Deep convolutional neural network (DCNNs) has obtained tremendous success in 2D and 3D medical image segmentation.
We propose a novel dense-sparse training flow from a data perspective, in which, densely adjacent slices and sparsely adjacent slices are extracted as inputs for regularizing DCNNs.
We also design a 2.5D light-weight nnU-Net from a network perspective, in which, depthwise separable convolutions are adopted to improve the efficiency.
arXiv Detail & Related papers (2021-08-15T15:29:48Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.