HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection
- URL: http://arxiv.org/abs/2412.06499v2
- Date: Mon, 16 Dec 2024 06:00:52 GMT
- Title: HYATT-Net is Grand: A Hybrid Attention Network for Performant Anatomical Landmark Detection
- Authors: Xiaoqian Zhou, Zhen Huang, Heqin Zhu, Qingsong Yao, S. Kevin Zhou,
- Abstract summary: Anatomical landmark detection (ALD) from a medical image is crucial for a wide array of clinical applications.
We propose a novel hybrid architecture that integrates CNNs and Transformers.
Experiments on five diverse datasets demonstrate state-of-the-art performance, surpassing existing methods in accuracy, robustness, and efficiency.
- Score: 17.290208035331734
- License:
- Abstract: Anatomical landmark detection (ALD) from a medical image is crucial for a wide array of clinical applications. While existing methods achieve quite some success in ALD, they often struggle to balance global context with computational efficiency, particularly with high-resolution images, thereby leading to the rise of a natural question: where is the performance limit of ALD? In this paper, we aim to forge performant ALD by proposing a {\bf HY}brid {\bf ATT}ention {\bf Net}work (HYATT-Net) with the following designs: (i) A novel hybrid architecture that integrates CNNs and Transformers. Its core is the BiFormer module, utilizing Bi-Level Routing Attention for efficient attention to relevant image regions. This, combined with Attention Residual Module(ARM), enables precise local feature refinement guided by the global context. (ii) A Feature Fusion Correction Module that aggregates multi-scale features and thus mitigates a resolution loss. Deep supervision with a mean-square error loss on multi-resolution heatmaps optimizes the model. Experiments on five diverse datasets demonstrate state-of-the-art performance, surpassing existing methods in accuracy, robustness, and efficiency. The HYATT-Net provides a promising solution for accurate and efficient ALD in complex medical images. Our codes and data are already released at: \url{https://github.com/ECNUACRush/HYATT-Net}.
Related papers
- HyperDefect-YOLO: Enhance YOLO with HyperGraph Computation for Industrial Defect Detection [12.865603495310328]
HD-YOLO consists of Defect Aware Module (DAM) and Mixed Graph Network (MGNet) in the backbone.
HGANet combines hypergraph and attention mechanism to aggregate multi-scale features.
Cross-Scale Fusion (CSF) is proposed to adaptively fuse and handle features instead of simple concatenation and convolution.
arXiv Detail & Related papers (2024-12-05T08:38:01Z) - PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models [48.87160158792048]
We introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way.
Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods.
arXiv Detail & Related papers (2024-05-26T10:58:22Z) - HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution [6.7341750484636975]
Transformer-based networks can only use input information from a limited spatial range.
A novel Hybrid Multi-Axis Aggregation network (HMA) is proposed in this paper to exploit feature potential information better.
The experimental results show that HMA outperforms the state-of-the-art methods on the benchmark dataset.
arXiv Detail & Related papers (2024-05-08T12:14:34Z) - Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently.
We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.
We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation [11.986549780782724]
We propose a hybrid yet effective CNN-Transformer network, named BRAU-Net++, for an accurate medical image segmentation task.
Specifically, BRAU-Net++ uses bi-level routing attention as the core building block to design our u-shaped encoder-decoder structure.
Our proposed approach surpasses other state-of-the-art methods including its baseline: BRAU-Net.
arXiv Detail & Related papers (2024-01-01T10:49:09Z) - Recurrent Multi-scale Transformer for High-Resolution Salient Object
Detection [68.65338791283298]
Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video.
Traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD.
In this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution.
arXiv Detail & Related papers (2023-08-07T17:49:04Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal
Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded.
We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.