Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation
- URL: http://arxiv.org/abs/2508.01941v1
- Date: Sun, 03 Aug 2025 22:31:00 GMT
- Title: Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation
- Authors: Andrea Dosi, Semanto Mondal, Rajib Chandra Ghosh, Massimo Brescia, Giuseppe Longo,
- Abstract summary: We adapt AMBER, a transformer-based model originally designed for multiband images, to the task of 3D medical datacube segmentation.<n>AMBER-AFNO achieves competitive or superior accuracy with significant gains in training efficiency, inference speed, and memory usage.
- Score: 0.57492870498084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents the results of a methodological transfer from remote sensing to healthcare, adapting AMBER -- a transformer-based model originally designed for multiband images, such as hyperspectral data -- to the task of 3D medical datacube segmentation. In this study, we use the AMBER architecture with Adaptive Fourier Neural Operators (AFNO) in place of the multi-head self-attention mechanism. While existing models rely on various forms of attention to capture global context, AMBER-AFNO achieves this through frequency-domain mixing, enabling a drastic reduction in model complexity. This design reduces the number of trainable parameters by over 80% compared to UNETR++, while maintaining a FLOPs count comparable to other state-of-the-art architectures. Model performance is evaluated on two benchmark 3D medical datasets -- ACDC and Synapse -- using standard metrics such as Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD), demonstrating that AMBER-AFNO achieves competitive or superior accuracy with significant gains in training efficiency, inference speed, and memory usage.
Related papers
- SAMRI-2: A Memory-based Model for Cartilage and Meniscus Segmentation in 3D MRIs of the Knee Joint [0.7879983966759583]
This study introduces a deep learning (DL) method for cartilage and meniscus segmentation from 3D MRIs using memory-based VFMs.<n>We trained four AI models-a CNN-based 3D-VNet, two automatic transformer-based models (SaMRI2D and SaMRI3D), and a transformer-based promptable memory-based VFM (SAMRI-2)-on 3D knee MRIs from 270 patients.<n>SAMRI-2 model, trained with HSS, outperformed all other models, achieving an average improvement of 5 points, with a peak improvement of 12 points for tibial cartilage.
arXiv Detail & Related papers (2025-02-14T21:18:01Z) - EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation [3.6813810514531085]
We introduce a novel 3D medical image segmentation model called EM-Net. Inspired by its success, we introduce a novel Mamba-based 3D medical image segmentation model called EM-Net.
Comprehensive experiments on two challenging multi-organ datasets with other state-of-the-art (SOTA) algorithms show that our method exhibits better segmentation accuracy while requiring nearly half the parameter size of SOTA models and 2x faster training speed.
arXiv Detail & Related papers (2024-09-26T09:34:33Z) - PAM: A Propagation-Based Model for Segmenting Any 3D Objects across Multi-Modal Medical Images [11.373941923130305]
PAM (Propagating Anything Model) is a segmentation approach that uses a 2D prompt, like a bounding box or sketch, to create a complete 3D segmentation of medical image volumes.
It significantly outperformed existing models like MedSAM and SegVol, with an average improvement of over 18.1% in dice similarity coefficient (DSC) across 44 medical datasets and various object types.
arXiv Detail & Related papers (2024-08-25T13:42:47Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers [34.282971510732736]
We introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture.
A composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound.
We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition.
arXiv Detail & Related papers (2024-03-14T02:11:38Z) - 3D-CLMI: A Motor Imagery EEG Classification Model via Fusion of 3D-CNN
and LSTM with Attention [0.174048653626208]
This paper proposed a model that combined a three-dimensional convolutional neural network (CNN) with a long short-term memory (LSTM) network to classify motor imagery (MI) signals.
Experimental results showed that this model achieved a classification accuracy of 92.7% and an F1-score of 0.91 on the public dataset BCI Competition IV dataset 2a.
The model greatly improved the classification accuracy of users' motor imagery intentions, giving brain-computer interfaces better application prospects in emerging fields such as autonomous vehicles and medical rehabilitation.
arXiv Detail & Related papers (2023-12-20T03:38:24Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Multiscale Metamorphic VAE for 3D Brain MRI Synthesis [5.060516201839319]
Generative modeling of 3D brain MRIs presents difficulties in achieving high visual fidelity while ensuring sufficient coverage of the data distribution.
In this work, we propose to address this challenge with composable, multiscale morphological transformations in a variational autoencoder framework.
We show substantial performance improvements in FID while retaining comparable, or superior, reconstruction quality compared to prior work based on VAEs and generative adversarial networks (GANs)
arXiv Detail & Related papers (2023-01-09T09:15:30Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.