H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging
- URL: http://arxiv.org/abs/2502.14221v1
- Date: Thu, 20 Feb 2025 03:36:12 GMT
- Title: H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging
- Authors: Zhen Huang, Ronghao Xu, Xiaoqian Zhou, Yangbo Wei, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao,
- Abstract summary: 3D landmark detection is a critical task in medical image analysis.
We propose a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism.
H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs.
- Score: 14.511779346332123
- License:
- Abstract: 3D landmark detection is a critical task in medical image analysis, and accurately detecting anatomical landmarks is essential for subsequent medical imaging tasks. However, mainstream deep learning methods in this field struggle to simultaneously capture fine-grained local features and model global spatial relationships, while maintaining a balance between accuracy and computational efficiency. Local feature extraction requires capturing fine-grained anatomical details, while global modeling requires understanding the spatial relationships within complex anatomical structures. The high-dimensional nature of 3D volume further exacerbates these challenges, as landmarks are sparsely distributed, leading to significant computational costs. Therefore, achieving efficient and precise 3D landmark detection remains a pressing challenge in medical image analysis. In this work, We propose a \textbf{H}ybrid \textbf{3}D \textbf{DE}tection \textbf{Net}(H3DE-Net), a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism designed to efficiently capture global dependencies in 3D volumetric data. This mechanism employs a hierarchical routing strategy to reduce computational cost while maintaining global context modeling. To our knowledge, H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs. Additionally, integrating multi-scale feature fusion further enhances detection accuracy and robustness. Experimental results on a public CT dataset demonstrate that H3DE-Net achieves state-of-the-art(SOTA) performance, significantly improving accuracy and robustness, particularly in scenarios with missing landmarks or complex anatomical variations. We aready open-source our project, including code, data and model weights.
Related papers
- Improving 3D Medical Image Segmentation at Boundary Regions using Local Self-attention and Global Volume Mixing [14.0825980706386]
Volumetric medical image segmentation is a fundamental problem in medical image analysis where the objective is to accurately classify a given 3D volumetric medical image with voxel-level precision.
In this work, we propose a novel hierarchical encoder-decoder-based framework that strives to explicitly capture the local and global dependencies for 3D medical image segmentation.
The proposed framework exploits local volume-based self-attention to encode the local dependencies at high resolution and introduces a novel volumetric-mixer to capture the global dependencies at low-resolution feature representations.
arXiv Detail & Related papers (2024-10-20T11:08:38Z) - μ-Net: A Deep Learning-Based Architecture for μ-CT Segmentation [2.012378666405002]
X-ray computed microtomography (mu-CT) is a non-destructive technique that can generate high-resolution 3D images of the internal anatomy of medical and biological samples.
extracting relevant information from 3D images requires semantic segmentation of the regions of interest.
We propose a novel framework that uses a convolutional neural network (CNN) to automatically segment the full morphology of the heart of Carassius auratus.
arXiv Detail & Related papers (2024-06-24T15:29:08Z) - Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields [19.71033340093199]
We propose a novel architecture, Perspective+ Unet, to overcome limitations in medical image segmentation.
The framework incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency capture.
Experimental results on the ACDC and datasets demonstrate the effectiveness of our proposed Perspective+ Unet.
arXiv Detail & Related papers (2024-06-20T07:17:39Z) - Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models.
We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Simulating Realistic MRI variations to Improve Deep Learning model and
visual explanations using GradCAM [0.0]
We use a modified HighRes3DNet model for solving brain MRI volumetric landmark detection problem.
Grad-CAM produces a coarse localization map highlighting the regions the model is focusing.
arXiv Detail & Related papers (2021-11-01T11:14:23Z) - Structure-Aware Long Short-Term Memory Network for 3D Cephalometric
Landmark Detection [37.031819721889676]
We propose a novel Structure-Aware Long Short-Term Memory framework (SA-LSTM) for efficient and accurate 3D landmark detection.
SA-LSTM first locates the coarse landmarks via heatmap regression on a down-sampled CBCT volume.
It then progressively refines landmarks by attentive offset regression using high-resolution cropped patches.
Experiments show that our method significantly outperforms state-of-the-art methods in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-07-21T06:35:52Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - 4D Spatio-Temporal Convolutional Networks for Object Position Estimation
in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images.
We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z) - Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection.
The proposed method constructs graph signals leveraging both local image features and global shape features.
Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.