CoMIR: Contrastive Multimodal Image Representation for Registration
- URL: http://arxiv.org/abs/2006.06325v2
- Date: Fri, 23 Oct 2020 08:54:38 GMT
- Title: CoMIR: Contrastive Multimodal Image Representation for Registration
- Authors: Nicolas Pielawski, Elisabeth Wetzer, Johan \"Ofverstedt, Jiahao Lu,
Carolina W\"ahlby, Joakim Lindblad and Nata\v{s}a Sladoje
- Abstract summary: We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations)
CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures.
- Score: 4.543268895439618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose contrastive coding to learn shared, dense image representations,
referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs
enable the registration of multimodal images where existing registration
methods often fail due to a lack of sufficiently similar image structures.
CoMIRs reduce the multimodal registration problem to a monomodal one, in which
general intensity-based, as well as feature-based, registration algorithms can
be applied. The method involves training one neural network per modality on
aligned images, using a contrastive loss based on noise-contrastive estimation
(InfoNCE). Unlike other contrastive coding methods, used for, e.g.,
classification, our approach generates image-like representations that contain
the information shared between modalities. We introduce a novel,
hyperparameter-free modification to InfoNCE, to enforce rotational equivariance
of the learnt representations, a property essential to the registration task.
We assess the extent of achieved rotational equivariance and the stability of
the representations with respect to weight initialization, training set, and
hyperparameter settings, on a remote sensing dataset of RGB and near-infrared
images. We evaluate the learnt representations through registration of a
biomedical dataset of bright-field and second-harmonic generation microscopy
images; two modalities with very little apparent correlation. The proposed
approach based on CoMIRs significantly outperforms registration of
representations created by GAN-based image-to-image translation, as well as a
state-of-the-art, application-specific method which takes additional knowledge
about the data into account. Code is available at:
https://github.com/MIDA-group/CoMIR.
Related papers
- MsMorph: An Unsupervised pyramid learning network for brain image registration [4.000367245594772]
MsMorph is an image registration framework aimed at mimicking the manual process of registering image pairs.
It decodes semantic information at different scales and continuously compen-sates for the predicted deformation field.
The proposed method simulates the manual approach to registration, focusing on different regions of the image pairs and their neighborhoods.
arXiv Detail & Related papers (2024-10-23T19:20:57Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Breaking Modality Disparity: Harmonized Representation for Infrared and
Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration.
We employ homography to simulate the deformation between different planes.
We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z) - Unsupervised Multi-Modal Medical Image Registration via
Discriminator-Free Image-to-Image Translation [4.43142018105102]
We propose a novel translation-based unsupervised deformable image registration approach to convert the multi-modal registration problem to a mono-modal one.
Our approach incorporates a discriminator-free translation network to facilitate the training of the registration network and a patchwise contrastive loss to encourage the translation network to preserve object shapes.
arXiv Detail & Related papers (2022-04-28T17:18:21Z) - Transformer-empowered Multi-scale Contextual Matching and Aggregation
for Multi-contrast MRI Super-resolution [55.52779466954026]
Multi-contrast super-resolution (SR) reconstruction is promising to yield SR images with higher quality.
Existing methods lack effective mechanisms to match and fuse these features for better reconstruction.
We propose a novel network to address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques.
arXiv Detail & Related papers (2022-03-26T01:42:59Z) - Semantic similarity metrics for learned image registration [10.355938901584565]
We propose a semantic similarity metric for image registration.
Our approach learns dataset-specific features that drive the optimization of a learning-based registration model.
We train both an unsupervised approach using an auto-encoder, and a semi-supervised approach using supplemental segmentation data to extract semantic features for image registration.
arXiv Detail & Related papers (2021-04-20T15:23:58Z) - Deep Group-wise Variational Diffeomorphic Image Registration [3.0022455491411653]
We propose to extend current learning-based image registration to allow simultaneous registration of multiple images.
We present a general mathematical framework that enables both registration of multiple images to their viscous geodesic average and registration in which any of the available images can be used as a fixed image.
arXiv Detail & Related papers (2020-10-01T07:37:28Z) - MvMM-RegNet: A new image registration framework based on multivariate
mixture model and neural network estimation [14.36896617430302]
We propose a new image registration framework based on generative model (MvMM) and neural network estimation.
A generative model consolidating both appearance and anatomical information is established to derive a novel loss function capable of implementing groupwise registration.
We highlight the versatility of the proposed framework for various applications on multimodal cardiac images.
arXiv Detail & Related papers (2020-06-28T11:19:15Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.