Addressing degeneracies in latent interpolation for diffusion models
- URL: http://arxiv.org/abs/2505.07481v1
- Date: Mon, 12 May 2025 12:12:57 GMT
- Title: Addressing degeneracies in latent interpolation for diffusion models
- Authors: Erik Landolsi, Fredrik Kahl,
- Abstract summary: It is useful to interpolate between latents produced by inverting a set of input images.<n>We observe that such can easily lead to degenerate results when the number of inputs is large.<n>We propose a simple normalization scheme that is easy to use whenever between latents is needed.
- Score: 11.80626524879555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is an increasing interest in using image-generating diffusion models for deep data augmentation and image morphing. In this context, it is useful to interpolate between latents produced by inverting a set of input images, in order to generate new images representing some mixture of the inputs. We observe that such interpolation can easily lead to degenerate results when the number of inputs is large. We analyze the cause of this effect theoretically and experimentally, and suggest a suitable remedy. The suggested approach is a relatively simple normalization scheme that is easy to use whenever interpolation between latents is needed. We measure image quality using FID and CLIP embedding distance and show experimentally that baseline interpolation methods lead to a drop in quality metrics long before the degeneration issue is clearly visible. In contrast, our method significantly reduces the degeneration effect and leads to improved quality metrics also in non-degenerate situations.
Related papers
- A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.<n> Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.<n>This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z) - Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations.<n> Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations.<n> Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints.<n>During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time [7.532695984765271]
We present a novel approach to generate high-resolution images with generative models.<n>Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next.<n>Our method offers several key benefits, including improved computational efficiency and faster inference times.
arXiv Detail & Related papers (2024-07-22T09:44:35Z) - Interpolating between Images with Diffusion Models [2.6027967363792865]
Interpolating between two input images is a task missing from image generation pipelines.
We propose a method for zero-shot using latent diffusion models.
For greater consistency, or to specify additional criteria, we can generate several candidates and use CLIP to select the highest quality image.
arXiv Detail & Related papers (2023-07-24T07:03:22Z) - Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature
Fusion [17.686973510425172]
This paper presents a new method for deep uncalibrated photometric stereo.
It efficiently utilizes the inter-image representation to guide the normal estimation.
Our method produces significantly better results than the state-of-the-art methods on both synthetic and real data.
arXiv Detail & Related papers (2022-08-06T03:59:54Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z) - NeurInt : Learning to Interpolate through Neural ODEs [18.104328632453676]
We propose a novel generative model that learns a distribution of trajectories between two images.
We demonstrate our approach's effectiveness in generating images improved quality as well as its ability to learn a diverse distribution over smooth trajectories for any pair of real source and target images.
arXiv Detail & Related papers (2021-11-07T16:31:18Z) - Weighted Encoding Based Image Interpolation With Nonlocal Linear
Regression Model [8.013127492678272]
In image super-resolution, the low-resolution image is directly down-sampled from its high-resolution counterpart without blurring and noise.
To address this problem, we propose a novel image model based on sparse representation.
New approach to learn adaptive sub-dictionary online instead of clustering.
arXiv Detail & Related papers (2020-03-04T03:20:21Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.