multiGradICON: A Foundation Model for Multimodal Medical Image Registration
- URL: http://arxiv.org/abs/2408.00221v2
- Date: Fri, 07 Feb 2025 14:41:23 GMT
- Title: multiGradICON: A Foundation Model for Multimodal Medical Image Registration
- Authors: Basar Demir, Lin Tian, Thomas Hastings Greer, Roland Kwitt, Francois-Xavier Vialard, Raul San Jose Estepar, Sylvain Bouix, Richard Jarrett Rushmore, Ebrahim Ebrahim, Marc Niethammer,
- Abstract summary: We develop multiGradICON as a first step towards universal *multimodal* medical image registration.
We show that 1) we can train a DL registration model that is suitable for monomodal *and* multimodal registration; 2) loss function randomization can increase multimodal registration accuracy; and 3) training a model with multimodal data helps multimodal generalization.
- Score: 15.174094859157288
- License:
- Abstract: Modern medical image registration approaches predict deformations using deep networks. These approaches achieve state-of-the-art (SOTA) registration accuracy and are generally fast. However, deep learning (DL) approaches are, in contrast to conventional non-deep-learning-based approaches, anatomy-specific. Recently, a universal deep registration approach, uniGradICON, has been proposed. However, uniGradICON focuses on monomodal image registration. In this work, we therefore develop multiGradICON as a first step towards universal *multimodal* medical image registration. Specifically, we show that 1) we can train a DL registration model that is suitable for monomodal *and* multimodal registration; 2) loss function randomization can increase multimodal registration accuracy; and 3) training a model with multimodal data helps multimodal generalization. Our code and the multiGradICON model are available at https://github.com/uncbiag/uniGradICON.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.
We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - Multimodal Autoregressive Pre-training of Large Vision Encoders [85.39154488397931]
We present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process.
Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification.
arXiv Detail & Related papers (2024-11-21T18:31:25Z) - Multi-modal deformable image registration using untrained neural networks [3.0832643041058603]
We propose a registration method that utilizes neural networks for image representation.
Our method uses untrained networks with limited representation capacity as an implicit prior to a good registration.
Unlike previous approaches that are specialized for specific data types, our method handles both rigid and non-rigid, as well as single- and multi-modal registration.
arXiv Detail & Related papers (2024-11-04T23:13:32Z) - Large Language Models for Multimodal Deformable Image Registration [50.91473745610945]
We propose a novel coarse-to-fine MDIR framework,LLM-Morph, for aligning the deep features from different modal medical images.
Specifically, we first utilize a CNN encoder to extract deep visual features from cross-modal image pairs, then we use the first adapter to adjust these tokens, and use LoRA in pre-trained LLMs to fine-tune their weights.
Third, for the alignment of tokens, we utilize other four adapters to transform the LLM-encoded tokens into multi-scale visual features, generating multi-scale deformation fields and facilitating the coarse-to-fine MDIR task
arXiv Detail & Related papers (2024-08-20T09:58:30Z) - Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework.
We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.
Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - uniGradICON: A Foundation Model for Medical Image Registration [17.23463407622614]
We propose uniGradICON, a first step toward a foundation model for registration.
We extensively trained and evaluated uniGradICON on twelve different public datasets.
arXiv Detail & Related papers (2024-03-09T03:26:35Z) - MAD: Modality Agnostic Distance Measure for Image Registration [14.558286801723293]
Multi-modal image registration is a crucial pre-processing step in many medical applications.
We present Modality Agnostic Distance (MAD), a measure that uses random convolutions to learn the inherent geometry of the images.
We demonstrate that not only can MAD affinely register multi-modal images successfully, but it has also a larger capture range than traditional measures.
arXiv Detail & Related papers (2023-09-06T09:59:58Z) - FormNetV2: Multimodal Graph Contrastive Learning for Form Document
Information Extraction [43.17713130538514]
We introduce a centralized graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss.
FormNetV2 establishes new state-of-the-art performance on FUNSD, CORD, SROIE and Payment benchmarks with a more compact model size.
arXiv Detail & Related papers (2023-05-04T05:02:04Z) - i-Code: An Integrative and Composable Multimodal Learning Framework [99.56065789066027]
i-Code is a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations.
The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning.
Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11%.
arXiv Detail & Related papers (2022-05-03T23:38:50Z) - Is Image-to-Image Translation the Panacea for Multimodal Image
Registration? A Comparative Study [4.00906288611816]
We conduct an empirical study of the applicability of modern I2I translation methods for the task of multimodal biomedical image registration.
We compare the performance of four Generative Adrial Network (GAN)-based methods and one contrastive representation learning method.
Our results suggest that, although I2I translation may be helpful when the modalities to register are clearly correlated, registration of modalities which express distinctly different properties of the sample are not well handled by the I2I translation approach.
arXiv Detail & Related papers (2021-03-30T11:28:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.