Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via
Stochastic Differential Equations without Training
- URL: http://arxiv.org/abs/2308.07665v2
- Date: Wed, 3 Jan 2024 14:36:11 GMT
- Title: Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via
Stochastic Differential Equations without Training
- Authors: Ximing Xing, Chuang Wang, Haitao Zhou, Zhihao Hu, Chongxuan Li, Dong
Xu, Qian Yu
- Abstract summary: Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches.
generating photo-realistic images with color and texture from sketch images remains challenging for diffusion models.
We propose a two-stage method named Inversion-by-Inversion" for exemplar-based sketch-to-photo synthesis.
- Score: 46.75803514327477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exemplar-based sketch-to-photo synthesis allows users to generate
photo-realistic images based on sketches. Recently, diffusion-based methods
have achieved impressive performance on image generation tasks, enabling
highly-flexible control through text-driven generation or energy functions.
However, generating photo-realistic images with color and texture from sketch
images remains challenging for diffusion models. Sketches typically consist of
only a few strokes, with most regions left blank, making it difficult for
diffusion-based methods to produce photo-realistic images. In this work, we
propose a two-stage method named ``Inversion-by-Inversion" for exemplar-based
sketch-to-photo synthesis. This approach includes shape-enhancing inversion and
full-control inversion. During the shape-enhancing inversion process, an
uncolored photo is generated with the guidance of a shape-energy function. This
step is essential to ensure control over the shape of the generated photo. In
the full-control inversion process, we propose an appearance-energy function to
control the color and texture of the final generated photo.Importantly, our
Inversion-by-Inversion pipeline is training-free and can accept different types
of exemplars for color and texture control. We conducted extensive experiments
to evaluate our proposed method, and the results demonstrate its effectiveness.
The code and project can be found at
https://ximinng.github.io/inversion-by-inversion-project/.
Related papers
- Motion Guidance: Diffusion-Based Image Editing with Differentiable
Motion Estimators [19.853978560075305]
Motion guidance is a technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move.
We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.
arXiv Detail & Related papers (2024-01-31T18:59:59Z) - Diverse Inpainting and Editing with GAN Inversion [4.234367850767171]
Recent inversion methods have shown that real images can be inverted into StyleGAN's latent space.
In this paper, we tackle an even more difficult task, inverting erased images into GAN's latent space for realistic inpaintings and editings.
arXiv Detail & Related papers (2023-07-27T17:41:36Z) - Dequantization and Color Transfer with Diffusion Models [5.228564799458042]
quantized images offer easy abstraction for patch-based edits and palette transfer.
We show that our model can generate natural images that respect the color palette the user asked for.
Our method can be usefully extended to another practical edit: recoloring patches of an image while respecting the source texture.
arXiv Detail & Related papers (2023-07-06T00:07:32Z) - High-Fidelity Guided Image Synthesis with Latent Diffusion Models [50.39294302741698]
The proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
Human user study results show that the proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
arXiv Detail & Related papers (2022-11-30T15:43:20Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - Adaptively-Realistic Image Generation from Stroke and Sketch with
Diffusion Model [31.652827838300915]
We propose a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models.
Our framework achieves state-of-the-art performance while providing flexibility in generating customized images with control over shape, color, and realism.
Our method unleashes applications such as editing on real images, generation with partial sketches and strokes, and multi-domain multi-modal synthesis.
arXiv Detail & Related papers (2022-08-26T13:59:26Z) - Learning to Caricature via Semantic Shape Transform [95.25116681761142]
We propose an algorithm based on a semantic shape transform to produce shape exaggerations.
We show that the proposed framework is able to render visually pleasing shape exaggerations while maintaining their facial structures.
arXiv Detail & Related papers (2020-08-12T03:41:49Z) - Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation.
The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image.
Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.