I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP
- URL: http://arxiv.org/abs/2409.12399v1
- Date: Thu, 19 Sep 2024 01:44:50 GMT
- Title: I2I-Galip: Unsupervised Medical Image Translation Using Generative Adversarial CLIP
- Authors: Yilmaz Korkmaz, Vishal M. Patel,
- Abstract summary: Unpaired image-to-image translation is a challenging task due to the absence of paired examples.
We propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip)
- Score: 30.506544165999564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unpaired image-to-image translation is a challenging task due to the absence of paired examples, which complicates learning the complex mappings between the distinct distributions of the source and target domains. One of the most commonly used approach for this task is CycleGAN which requires the training of a new pair of generator-discriminator networks for each domain pair. In this paper, we propose a new image-to-image translation framework named Image-to-Image-Generative-Adversarial-CLIP (I2I-Galip) where we utilize a pre-trained multi-model foundation model (i.e., CLIP) to mitigate the need of separate generator-discriminator pairs for each source-target mapping while achieving better and more efficient multi-domain translation. By utilizing the massive knowledge gathered during pre-training a foundation model, our approach makes use of a single lightweight generator network with ~13M parameters for the multi-domain image translation task. Comprehensive experiments on translation performance in public MRI and CT datasets show the superior performance of the proposed framework over the existing approaches. Code will be available (https://github.com/yilmazkorkmaz1/I2I-Galip).
Related papers
- Large Language Models for Multimodal Deformable Image Registration [50.91473745610945]
We propose a novel coarse-to-fine MDIR framework,LLM-Morph, for aligning the deep features from different modal medical images.
Specifically, we first utilize a CNN encoder to extract deep visual features from cross-modal image pairs, then we use the first adapter to adjust these tokens, and use LoRA in pre-trained LLMs to fine-tune their weights.
Third, for the alignment of tokens, we utilize other four adapters to transform the LLM-encoded tokens into multi-scale visual features, generating multi-scale deformation fields and facilitating the coarse-to-fine MDIR task
arXiv Detail & Related papers (2024-08-20T09:58:30Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Guided Image-to-Image Translation by Discriminator-Generator
Communication [71.86347329356244]
The goal of Image-to-image (I2I) translation is to transfer an image from a source domain to a target domain.
One major branch of this research is to formulate I2I translation based on Generative Adversarial Network (GAN)
arXiv Detail & Related papers (2023-03-07T02:29:36Z) - Multi-domain Unsupervised Image-to-Image Translation with Appearance
Adaptive Convolution [62.4972011636884]
We propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework.
We exploit the decomposed content feature and appearance adaptive convolution to translate an image into a target appearance.
We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-02-06T14:12:34Z) - HYLDA: End-to-end Hybrid Learning Domain Adaptation for LiDAR Semantic
Segmentation [13.87939140266266]
This paper addresses the problem of training a LiDAR semantic segmentation network using a fully-labeled source dataset and a target dataset that only has a small number of labels.
We develop a novel image-to-image translation engine, and couple it with a LiDAR semantic segmentation network, resulting in an integrated domain adaptation architecture we call HYLDA.
arXiv Detail & Related papers (2022-01-14T18:13:09Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - SoloGAN: Multi-domain Multimodal Unpaired Image-to-Image Translation via
a Single Generative Adversarial Network [4.7344504314446345]
We present a flexible and general SoloGAN model for efficient multimodal I2I translation among multiple domains with unpaired data.
In contrast to existing methods, the SoloGAN algorithm uses a single projection discriminator with an additional auxiliary classifier and shares the encoder and generator for all domains.
arXiv Detail & Related papers (2020-08-04T16:31:15Z) - GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image
Translation via Attribute Gaussian Mixture Modeling [66.50914391679375]
Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images.
Recent studies have shown remarkable success for multiple domains but they suffer from two main limitations.
We propose a method named GMM-UNIT, which is based on a content-attribute disentangled representation where the space is fitted with a GMM.
arXiv Detail & Related papers (2020-03-15T10:18:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.