Network-to-Network Translation with Conditional Invertible Neural
Networks
- URL: http://arxiv.org/abs/2005.13580v2
- Date: Mon, 9 Nov 2020 20:34:36 GMT
- Title: Network-to-Network Translation with Conditional Invertible Neural
Networks
- Authors: Robin Rombach and Patrick Esser and Bj\"orn Ommer
- Abstract summary: Recent work suggests that the power of massive machine learning models is captured by the representations they learn.
We seek a model that can relate between different existing representations and propose to solve this task with a conditionally invertible network.
Our domain transfer network can translate between fixed representations without having to learn or finetune them.
- Score: 19.398202091883366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the ever-increasing computational costs of modern machine learning
models, we need to find new ways to reuse such expert models and thus tap into
the resources that have been invested in their creation. Recent work suggests
that the power of these massive models is captured by the representations they
learn. Therefore, we seek a model that can relate between different existing
representations and propose to solve this task with a conditionally invertible
network. This network demonstrates its capability by (i) providing generic
transfer between diverse domains, (ii) enabling controlled content synthesis by
allowing modification in other domains, and (iii) facilitating diagnosis of
existing representations by translating them into interpretable domains such as
images. Our domain transfer network can translate between fixed representations
without having to learn or finetune them. This allows users to utilize various
existing domain-specific expert models from the literature that had been
trained with extensive computational resources. Experiments on diverse
conditional image synthesis tasks, competitive image modification results and
experiments on image-to-image and text-to-image generation demonstrate the
generic applicability of our approach. For example, we translate between BERT
and BigGAN, state-of-the-art text and image models to provide text-to-image
generation, which neither of both experts can perform on their own.
Related papers
- Conditional Text-to-Image Generation with Reference Guidance [81.99538302576302]
This paper explores using additional conditions of an image that provides visual guidance of the particular subjects for diffusion models to generate.
We develop several small-scale expert plugins that efficiently endow a Stable Diffusion model with the capability to take different references.
Our expert plugins demonstrate superior results than the existing methods on all tasks, each containing only 28.55M trainable parameters.
arXiv Detail & Related papers (2024-11-22T21:38:51Z) - Mechanisms of Generative Image-to-Image Translation Networks [1.602820210496921]
We propose a streamlined image-to-image translation network with a simpler architecture compared to existing models.
We show that adversarial for GAN models yields results comparable to those of existing methods without additional complex loss penalties.
arXiv Detail & Related papers (2024-11-15T17:17:46Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Variational Bayesian Framework for Advanced Image Generation with
Domain-Related Variables [29.827191184889898]
We present a unified Bayesian framework for advanced conditional generative problems.
We propose a variational Bayesian image translation network (VBITN) that enables multiple image translation and editing tasks.
arXiv Detail & Related papers (2023-05-23T09:47:23Z) - Investigating GANsformer: A Replication Study of a State-of-the-Art
Image Generation Model [0.0]
We reproduce and evaluate a novel variation of the original GAN network, the GANformer.
Due to resources and time limitations, we had to constrain the network's training times, dataset types, and sizes.
arXiv Detail & Related papers (2023-03-15T12:51:16Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - WEDGE: Web-Image Assisted Domain Generalization for Semantic
Segmentation [72.88657378658549]
We propose a WEb-image assisted Domain GEneralization scheme, which is the first to exploit the diversity of web-crawled images for generalizable semantic segmentation.
We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.
arXiv Detail & Related papers (2021-09-29T05:19:58Z) - StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [63.85888518950824]
We present a text-driven method that allows shifting a generative model to new domains.
We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains.
arXiv Detail & Related papers (2021-08-02T14:46:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.