Interpreting the Weight Space of Customized Diffusion Models
- URL: http://arxiv.org/abs/2406.09413v2
- Date: Wed, 17 Jul 2024 18:01:11 GMT
- Title: Interpreting the Weight Space of Customized Diffusion Models
- Authors: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman,
- Abstract summary: We investigate the space of weights spanned by a large collection of customized diffusion models.
We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity.
We demonstrate three immediate applications of this space -- sampling, editing, and inversion.
- Score: 79.14866339932199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of this space -- sampling, editing, and inversion. First, as each point in the space corresponds to an identity, sampling a set of weights from it results in a model encoding a novel identity. Next, we find linear directions in this space corresponding to semantic edits of the identity (e.g., adding a beard). These edits persist in appearance across generated samples. Finally, we show that inverting a single image into this space reconstructs a realistic identity, even if the input image is out of distribution (e.g., a painting). Our results indicate that the weight space of fine-tuned diffusion models behaves as an interpretable latent space of identities.
Related papers
- Isometric Representation Learning for Disentangled Latent Space of Diffusion Models [17.64488229224982]
We present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold.
This approach allows diffusion models to learn a more disentangled latent space, which enables smoother, precise more accurate inversion, and more control over attributes directly in the latent space.
arXiv Detail & Related papers (2024-07-16T07:36:01Z) - Explorable Mesh Deformation Subspaces from Unstructured Generative
Models [53.23510438769862]
Deep generative models of 3D shapes often feature continuous latent spaces that can be used to explore potential variations.
We present a method to explore variations among a given set of landmark shapes by constructing a mapping from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model.
arXiv Detail & Related papers (2023-10-11T18:53:57Z) - BLiSS: Bootstrapped Linear Shape Space [38.85525540566456]
We introduce BLiSS, a method to solve shape space and correspondence problems.
Starting from a small set of manually registered scans, we enrich the shape space and then use that to get new unregistered scans into correspondence automatically.
The critical component of BLiSS is a non-linear deformation model that captures details missed by the low-dimensional shape space.
arXiv Detail & Related papers (2023-09-04T18:54:56Z) - Disentangling Variational Autoencoders [0.0]
A variational autoencoder (VAE) projects an input set of high-dimensional data to a lower-dimensional, latent space.
We implement three different VAE models from the literature and train them on a dataset of 60,000 images of hand-written digits.
We investigate the trade-offs between the quality of the reconstruction of the decoded images and the level of disentanglement of the latent space.
arXiv Detail & Related papers (2022-11-14T19:22:41Z) - Unifying Diffusion Models' Latent Space, with Applications to
CycleDiffusion and Guidance [95.12230117950232]
We show that a common latent space emerges from two diffusion models trained independently on related domains.
Applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors.
arXiv Detail & Related papers (2022-10-11T15:53:52Z) - OCD: Learning to Overfit with Conditional Diffusion Models [95.1828574518325]
We present a dynamic model in which the weights are conditioned on an input sample x.
We learn to match those weights that would be obtained by finetuning a base model on x and its label y.
arXiv Detail & Related papers (2022-10-02T09:42:47Z) - Few Shot Generative Model Adaption via Relaxed Spatial Structural
Alignment [130.84010267004803]
Training a generative adversarial network (GAN) with limited data has been a challenging task.
A feasible solution is to start with a GAN well-trained on a large scale source domain and adapt it to the target domain with a few samples, termed as few shot generative model adaption.
We propose a relaxed spatial structural alignment method to calibrate the target generative models during the adaption.
arXiv Detail & Related papers (2022-03-06T14:26:25Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - The Geometry of Deep Generative Image Models and its Applications [0.0]
Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets.
These networks are trained to map random inputs in their latent space to new samples representative of the learned data.
The structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator.
arXiv Detail & Related papers (2021-01-15T07:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.