Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers
- URL: http://arxiv.org/abs/2405.12781v1
- Date: Tue, 21 May 2024 13:28:32 GMT
- Title: Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers
- Authors: Abhiroop Talasila, Maitreya Maity, U. Deva Priyakumar,
- Abstract summary: We augment the Swin Transformer to learn from different medical imaging modalities, enhancing downstream performance.
Our model, dubbed SwinFUSE, offers three key advantages: (i) it learns from both Computed Tomography (CT) and Magnetic Resonance Images (MRI) during pre-training, resulting in complementary feature representations; (ii) a domain-invariance module (DIM) that effectively highlights salient input regions, enhancing adaptability; (iii) exhibits remarkable generalizability, surpassing the confines of tasks it was initially pre-trained on.
- Score: 0.7496510641958004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised pre-training has emerged as a transformative paradigm, displaying remarkable advancements in various domains. However, the susceptibility to domain shift, where pre-training data distribution differs from fine-tuning, poses a significant obstacle. To address this, we augment the Swin Transformer to learn from different medical imaging modalities, enhancing downstream performance. Our model, dubbed SwinFUSE (Swin Multi-Modal Fusion for UnSupervised Enhancement), offers three key advantages: (i) it learns from both Computed Tomography (CT) and Magnetic Resonance Images (MRI) during pre-training, resulting in complementary feature representations; (ii) a domain-invariance module (DIM) that effectively highlights salient input regions, enhancing adaptability; (iii) exhibits remarkable generalizability, surpassing the confines of tasks it was initially pre-trained on. Our experiments on two publicly available 3D segmentation datasets show a modest 1-2% performance trade-off compared to single-modality models, yet significant out-performance of up to 27% on out-of-distribution modality. This substantial improvement underscores our proposed approach's practical relevance and real-world applicability. Code is available at: https://github.com/devalab/SwinFUSE
Related papers
- Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation [1.9035011984138845]
Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from labeled source domains to improve performance on unlabeled target domains.
Recent research has shown promise in applying Vision Transformers (ViTs) to this task.
We propose a novel Feature Fusion Transferability Aware Transformer (FFTAT) to enhance ViT performance in UDA tasks.
arXiv Detail & Related papers (2024-11-10T22:23:12Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Pseudo-Trilateral Adversarial Training for Domain Adaptive
Traversability Prediction [8.145900996884993]
Traversability prediction is a fundamental perception capability for autonomous navigation.
We propose a novel perception model that adopts a coarse-to-fine alignment (CALI) to perform unsupervised domain adaptation (UDA)
We show the superiorities of our proposed models over multiple baselines in several challenging domain adaptation setups.
arXiv Detail & Related papers (2023-06-26T00:39:32Z) - Improving Neural Additive Models with Bayesian Principles [54.29602161803093]
Neural additive models (NAMs) enhance the transparency of deep neural networks by handling calibrated input features in separate additive sub-networks.
We develop Laplace-approximated NAMs (LA-NAMs) which show improved empirical performance on datasets and challenging real-world medical tasks.
arXiv Detail & Related papers (2023-05-26T13:19:15Z) - Robust Representation Learning with Self-Distillation for Domain Generalization [2.0817769887373245]
We propose a novel domain generalization technique called Robust Representation Learning with Self-Distillation.
We observe an average accuracy improvement in the range of 1.2% to 2.3% over the state-of-the-art on three datasets.
arXiv Detail & Related papers (2023-02-14T07:39:37Z) - Robust and Efficient Segmentation of Cross-domain Medical Images [37.38861543166964]
We propose a generalizable knowledge distillation method for robust and efficient segmentation of medical images.
We propose two generalizable knowledge distillation schemes, Dual Contrastive Graph Distillation (DCGD) and Domain-Invariant Cross Distillation (DICD)
In DICD, the domain-invariant semantic vectors from the two models (i.e., teacher and student) are leveraged to cross-reconstruct features by the header exchange of MSAN.
arXiv Detail & Related papers (2022-07-26T15:55:36Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Con$^{2}$DA: Simplifying Semi-supervised Domain Adaptation by Learning
Consistent and Contrastive Feature Representations [1.2891210250935146]
Con$2$DA is a framework that extends recent advances in semi-supervised learning to the semi-supervised domain adaptation problem.
Our framework generates pairs of associated samples by performing data transformations to a given input.
We use different loss functions to enforce consistency between the feature representations of associated data pairs of samples.
arXiv Detail & Related papers (2022-04-04T15:05:45Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.