Interpolating between Images with Diffusion Models
- URL: http://arxiv.org/abs/2307.12560v1
- Date: Mon, 24 Jul 2023 07:03:22 GMT
- Title: Interpolating between Images with Diffusion Models
- Authors: Clinton J. Wang and Polina Golland
- Abstract summary: Interpolating between two input images is a task missing from image generation pipelines.
We propose a method for zero-shot using latent diffusion models.
For greater consistency, or to specify additional criteria, we can generate several candidates and use CLIP to select the highest quality image.
- Score: 2.6027967363792865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One little-explored frontier of image generation and editing is the task of
interpolating between two input images, a feature missing from all currently
deployed image generation pipelines. We argue that such a feature can expand
the creative applications of such models, and propose a method for zero-shot
interpolation using latent diffusion models. We apply interpolation in the
latent space at a sequence of decreasing noise levels, then perform denoising
conditioned on interpolated text embeddings derived from textual inversion and
(optionally) subject poses. For greater consistency, or to specify additional
criteria, we can generate several candidates and use CLIP to select the highest
quality image. We obtain convincing interpolations across diverse subject
poses, image styles, and image content, and show that standard quantitative
metrics such as FID are insufficient to measure the quality of an
interpolation. Code and data are available at
https://clintonjwang.github.io/interpolation.
Related papers
- RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance [22.326405355520176]
RefDrop allows users to control the influence of reference context in a direct and precise manner.
Our method also enables more interesting applications, such as the consistent generation of multiple subjects.
arXiv Detail & Related papers (2024-05-27T21:23:20Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Smooth image-to-image translations with latent space interpolations [64.8170758294427]
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain.
We show that our regularization techniques can improve the state-of-the-art I2I translations by a large margin.
arXiv Detail & Related papers (2022-10-03T11:57:30Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - NeurInt : Learning to Interpolate through Neural ODEs [18.104328632453676]
We propose a novel generative model that learns a distribution of trajectories between two images.
We demonstrate our approach's effectiveness in generating images improved quality as well as its ability to learn a diverse distribution over smooth trajectories for any pair of real source and target images.
arXiv Detail & Related papers (2021-11-07T16:31:18Z) - Smoothing the Disentangled Latent Style Space for Unsupervised
Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results.
We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z) - UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion
Probabilistic Models [19.499490172426427]
We propose a novel unpaired image-to-image translation method that uses denoising diffusion probabilistic models without requiring adversarial training.
Our method, UNpaired Image Translation with Denoising Diffusion Probabilistic Models (UNIT-DDPM), trains a generative model to infer the joint distribution of images over both domains as a Markov chain.
arXiv Detail & Related papers (2021-04-12T11:22:56Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z) - Weighted Encoding Based Image Interpolation With Nonlocal Linear
Regression Model [8.013127492678272]
In image super-resolution, the low-resolution image is directly down-sampled from its high-resolution counterpart without blurring and noise.
To address this problem, we propose a novel image model based on sparse representation.
New approach to learn adaptive sub-dictionary online instead of clustering.
arXiv Detail & Related papers (2020-03-04T03:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.