HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation
- URL: http://arxiv.org/abs/2505.10464v3
- Date: Tue, 27 May 2025 02:19:45 GMT
- Title: HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation
- Authors: Jiaming Liang, Lihuan Dai, Xiaoqi Sheng, Xiangguang Chen, Chun Yao, Guihua Tao, Qibin Leng, Hongmin Cai, Xi Zhong,
- Abstract summary: HWA-UNETR is a novel 3D segmentation framework that employs an original HWA block with learnable window aggregation layers.<n>Our framework surpasses existing methods by up to 1.68% in the Dice score while maintaining solid robustness.
- Score: 11.180283823428711
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial resource expenditure and a potential decline in analysis accuracy. To address those challenges, we have made two major contributions: First, we publicly disseminate the GCM 2025 dataset, which serves as the first large-scale, open-source collection of gastric cancer multimodal MRI scans, featuring professionally annotated FS-T2W, CE-T1W, and ADC images from 500 patients. Second, we introduce HWA-UNETR, a novel 3D segmentation framework that employs an original HWA block with learnable window aggregation layers to establish dynamic feature correspondences between different modalities' anatomical structures, and leverages the innovative tri-orientated fusion mamba mechanism for context modeling and capturing long-range spatial dependencies. Extensive experiments on our GCM 2025 dataset and the publicly BraTS 2021 dataset validate the performance of our framework, demonstrating that the new approach surpasses existing methods by up to 1.68\% in the Dice score while maintaining solid robustness. The dataset and code are public via https://github.com/JeMing-creater/HWA-UNETR.
Related papers
- TCDiff: Triplex Cascaded Diffusion for High-fidelity Multimodal EHRs Generation with Incomplete Clinical Data [7.661128607911307]
We propose TCDiff, a novel EHR generation framework that cascades three diffusion networks to learn the features of real-world EHR data.<n> TCDiff consistently outperforms state-of-the-art baselines by an average of 10% in data fidelity under various missing rate.<n>This highlights the effectiveness, robustness, and generalizability of our approach in real-world healthcare scenarios.
arXiv Detail & Related papers (2025-08-03T06:24:20Z) - Foundation Model for Whole-Heart Segmentation: Leveraging Student-Teacher Learning in Multi-Modal Medical Imaging [0.510750648708198]
Whole-heart segmentation from CT and MRI scans is crucial for cardiovascular disease analysis.<n>Existing methods struggle with modality-specific biases and the need for extensive labeled datasets.<n>We propose a foundation model for whole-heart segmentation using a self-supervised learning framework based on a student-teacher architecture.
arXiv Detail & Related papers (2025-03-24T14:47:54Z) - Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge [55.252714550918824]
AortaSeg24 MICCAI Challenge introduced the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones.<n>This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms.
arXiv Detail & Related papers (2025-02-07T21:09:05Z) - XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation [22.908801443059758]
XGeM is a multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities.<n>XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy.<n>We show how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity.
arXiv Detail & Related papers (2025-01-08T16:53:56Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans.<n>Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss.<n>We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.