DinoAtten3D: Slice-Level Attention Aggregation of DinoV2 for 3D Brain MRI Anomaly Classification
- URL: http://arxiv.org/abs/2509.12512v1
- Date: Mon, 15 Sep 2025 23:31:40 GMT
- Title: DinoAtten3D: Slice-Level Attention Aggregation of DinoV2 for 3D Brain MRI Anomaly Classification
- Authors: Fazle Rafsani, Jay Shah, Catherine D. Chong, Todd J. Schwedt, Teresa Wu,
- Abstract summary: Anomaly detection and classification in medical imaging are critical for early diagnosis but remain challenging due to limited annotated data, class imbalance, and the high cost of expert labeling.<n>We propose an attention-based global aggregation framework tailored specifically for 3D medical image anomaly classification.
- Score: 2.731729370870452
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Anomaly detection and classification in medical imaging are critical for early diagnosis but remain challenging due to limited annotated data, class imbalance, and the high cost of expert labeling. Emerging vision foundation models such as DINOv2, pretrained on extensive, unlabeled datasets, offer generalized representations that can potentially alleviate these limitations. In this study, we propose an attention-based global aggregation framework tailored specifically for 3D medical image anomaly classification. Leveraging the self-supervised DINOv2 model as a pretrained feature extractor, our method processes individual 2D axial slices of brain MRIs, assigning adaptive slice-level importance weights through a soft attention mechanism. To further address data scarcity, we employ a composite loss function combining supervised contrastive learning with class-variance regularization, enhancing inter-class separability and intra-class consistency. We validate our framework on the ADNI dataset and an institutional multi-class headache cohort, demonstrating strong anomaly classification performance despite limited data availability and significant class imbalance. Our results highlight the efficacy of utilizing pretrained 2D foundation models combined with attention-based slice aggregation for robust volumetric anomaly detection in medical imaging. Our implementation is publicly available at https://github.com/Rafsani/DinoAtten3D.git.
Related papers
- Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - Self-Supervised Cross-Encoder for Neurodegenerative Disease Diagnosis [6.226851122403944]
We propose a novel self-supervised cross-encoder framework that leverages the temporal continuity in longitudinal MRI scans for supervision.<n>This framework disentangles learned representations into two components: a static representation, constrained by contrastive learning, which captures stable anatomical features; and a dynamic representation, guided by input-gradient regularization, which reflects temporal changes.<n> Experimental results on the Alzheimer's Disease Neuroimaging Initiative dataset demonstrate that our method achieves superior classification accuracy and improved interpretability.
arXiv Detail & Related papers (2025-09-09T11:52:24Z) - Data-Efficient Fine-Tuning of Vision-Language Models for Diagnosis of Alzheimer's Disease [3.46857682956989]
Medical vision-language models (Med-VLMs) have shown impressive results in tasks such as report generation and visual question answering.<n>Most existing models are typically trained from scratch or fine-tuned on large-scale 2D image-text pairs.<n>We propose a data-efficient fine-tuning pipeline to adapt 3D CT-based Med-VLMs for 3D MRI.
arXiv Detail & Related papers (2025-09-09T11:36:21Z) - Unified Supervision For Vision-Language Modeling in 3D Computed Tomography [1.4193731654133002]
General-purpose vision-language models (VLMs) have emerged as promising tools in radiology, offering zero-shot capabilities.<n>In high-stakes domains like diagnostic radiology, these models often lack the discriminative precision required for reliable clinical use.<n>We introduce Uniferum, a volumetric VLM that unifies diverse supervision signals, encoded in classification labels and segmentation masks, into a single training framework.
arXiv Detail & Related papers (2025-09-01T15:30:17Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - Weakly-supervised positional contrastive learning: application to
cirrhosis classification [45.63061034568991]
Large medical imaging datasets can be cheaply annotated with low-confidence, weak labels.
Access to high-confidence labels, such as histology-based diagnoses, is rare and costly.
We propose an efficient weakly-supervised positional (WSP) contrastive learning strategy.
arXiv Detail & Related papers (2023-07-10T15:02:13Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Feature robustness and sex differences in medical imaging: a case study
in MRI-based Alzheimer's disease detection [1.7616042687330637]
We compare two classification schemes on the ADNI MRI dataset.
We do not find a strong dependence of model performance for male and female test subjects on the sex composition of the training dataset.
arXiv Detail & Related papers (2022-04-04T17:37:54Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - VoxelHop: Successive Subspace Learning for ALS Disease Classification
Using Structural MRI [30.469124322749828]
We present a subspace learning model, termed VoxelHop, for accurate classification of Amyotrophic Lateral Sclerosis (ALS)
Compared with popular convolutional neural network (CNN) architectures, VoxelHop has modular and transparent structures with fewer parameters without any backpropagation.
Our framework can easily be generalized to other classification tasks using different imaging modalities.
arXiv Detail & Related papers (2021-01-13T15:25:57Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z) - A Self-ensembling Framework for Semi-supervised Knee Cartilage Defects
Assessment with Dual-Consistency [40.67137486295487]
We propose a novel approach for knee cartilage defects assessment, including severity classification and lesion localization.
A self-ensembling framework is composed of a student network and a teacher network with the same structure.
Experiments show that the proposed method can significantly improve the self-ensembling performance in both knee cartilage defects classification and localization.
arXiv Detail & Related papers (2020-05-19T04:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.