Related papers: H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

URL: http://arxiv.org/abs/2502.14221v1
Date: Thu, 20 Feb 2025 03:36:12 GMT
Title: H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging
Authors: Zhen Huang, Ronghao Xu, Xiaoqian Zhou, Yangbo Wei, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao,
Abstract summary: 3D landmark detection is a critical task in medical image analysis.<n>We propose a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism.<n>H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs.
Score: 14.511779346332123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D landmark detection is a critical task in medical image analysis, and accurately detecting anatomical landmarks is essential for subsequent medical imaging tasks. However, mainstream deep learning methods in this field struggle to simultaneously capture fine-grained local features and model global spatial relationships, while maintaining a balance between accuracy and computational efficiency. Local feature extraction requires capturing fine-grained anatomical details, while global modeling requires understanding the spatial relationships within complex anatomical structures. The high-dimensional nature of 3D volume further exacerbates these challenges, as landmarks are sparsely distributed, leading to significant computational costs. Therefore, achieving efficient and precise 3D landmark detection remains a pressing challenge in medical image analysis. In this work, We propose a \textbf{H}ybrid \textbf{3}D \textbf{DE}tection \textbf{Net}(H3DE-Net), a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism designed to efficiently capture global dependencies in 3D volumetric data. This mechanism employs a hierarchical routing strategy to reduce computational cost while maintaining global context modeling. To our knowledge, H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs. Additionally, integrating multi-scale feature fusion further enhances detection accuracy and robustness. Experimental results on a public CT dataset demonstrate that H3DE-Net achieves state-of-the-art(SOTA) performance, significantly improving accuracy and robustness, particularly in scenarios with missing landmarks or complex anatomical variations. We aready open-source our project, including code, data and model weights.

Related papers

AI-CNet3D: An Anatomically-Informed Cross-Attention Network with Multi-Task Consistency Fine-tuning for 3D Glaucoma Classification [0.4999814847776097]
Glaucoma is a progressive eye disease that leads to optic nerve damage, causing irreversible vision loss if left untreated.<n>We propose a novel hybrid deep learning model that integrates cross-attention mechanisms into a 3D convolutional neural network.<n>We have named this model AI-CNet3D (AI-See'-Net3D) to reflect its design as an Anatomically-Informed Cross-attention Network operating on 3D data.
arXiv Detail & Related papers (2025-10-01T13:30:55Z)
TRELLIS-Enhanced Surface Features for Comprehensive Intracranial Aneurysm Analysis [2.624902795082451]
Intracranial aneurysms pose a significant clinical risk yet are difficult to detect, delineate and model due to limited annotated 3D data.<n>We propose a cross-domain feature-transfer approach that leverages the latent geometric embeddings learned by TRELLIS, a generative model trained on large-scale non-medical 3D datasets.
arXiv Detail & Related papers (2025-09-03T07:51:17Z)
Structured Spectral Graph Learning for Anomaly Classification in 3D Chest CT Scans [0.0]
We propose a new graph-based approach that models CT scans as structured graphs, leveraging axial slice triplets nodes processed through spectral domain convolution to enhance anomaly classification performance.<n>Our method exhibits strong cross-dataset generalization, and competitive performance while achieving robustness to z-axis translation.
arXiv Detail & Related papers (2025-08-01T19:52:34Z)
HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection [17.290208035331734]
Anatomical landmark detection (ALD) from a medical image is crucial for a wide array of clinical applications.<n>We propose a novel hybrid architecture that integrates CNNs and Transformers.<n> Experiments on five diverse datasets demonstrate state-of-the-art performance, surpassing existing methods in accuracy, robustness, and efficiency.
arXiv Detail & Related papers (2024-12-09T13:58:00Z)
Improving 3D Medical Image Segmentation at Boundary Regions using Local Self-attention and Global Volume Mixing [14.0825980706386]
Volumetric medical image segmentation is a fundamental problem in medical image analysis where the objective is to accurately classify a given 3D volumetric medical image with voxel-level precision. In this work, we propose a novel hierarchical encoder-decoder-based framework that strives to explicitly capture the local and global dependencies for 3D medical image segmentation. The proposed framework exploits local volume-based self-attention to encode the local dependencies at high resolution and introduces a novel volumetric-mixer to capture the global dependencies at low-resolution feature representations.
arXiv Detail & Related papers (2024-10-20T11:08:38Z)
μ-Net: A Deep Learning-Based Architecture for μ-CT Segmentation [2.012378666405002]
X-ray computed microtomography (mu-CT) is a non-destructive technique that can generate high-resolution 3D images of the internal anatomy of medical and biological samples. extracting relevant information from 3D images requires semantic segmentation of the regions of interest. We propose a novel framework that uses a convolutional neural network (CNN) to automatically segment the full morphology of the heart of Carassius auratus.
arXiv Detail & Related papers (2024-06-24T15:29:08Z)
Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields [19.71033340093199]
We propose a novel architecture, Perspective+ Unet, to overcome limitations in medical image segmentation. The framework incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency capture. Experimental results on the ACDC and datasets demonstrate the effectiveness of our proposed Perspective+ Unet.
arXiv Detail & Related papers (2024-06-20T07:17:39Z)
Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models. We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z)
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z)
Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information. Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z)
Simulating Realistic MRI variations to Improve Deep Learning model and visual explanations using GradCAM [0.0]
We use a modified HighRes3DNet model for solving brain MRI volumetric landmark detection problem. Grad-CAM produces a coarse localization map highlighting the regions the model is focusing.
arXiv Detail & Related papers (2021-11-01T11:14:23Z)
Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection [37.031819721889676]
We propose a novel Structure-Aware Long Short-Term Memory framework (SA-LSTM) for efficient and accurate 3D landmark detection. SA-LSTM first locates the coarse landmarks via heatmap regression on a down-sampled CBCT volume. It then progressively refines landmarks by attentive offset regression using high-resolution cropped patches. Experiments show that our method significantly outperforms state-of-the-art methods in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-07-21T06:35:52Z)
Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving. In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z)
Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices. With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset. The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
4D Spatio-Temporal Convolutional Networks for Object Position Estimation in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images. We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection. The proposed method constructs graph signals leveraging both local image features and global shape features. Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.