Uni-paint: A Unified Framework for Multimodal Image Inpainting with
Pretrained Diffusion Model
- URL: http://arxiv.org/abs/2310.07222v1
- Date: Wed, 11 Oct 2023 06:11:42 GMT
- Title: Uni-paint: A Unified Framework for Multimodal Image Inpainting with
Pretrained Diffusion Model
- Authors: Shiyuan Yang, Xiaodong Chen, Jing Liao
- Abstract summary: We propose Uni-paint, a unified framework for multimodal inpainting.
Uni-paint offers various modes of guidance, including text-driven, stroke-driven, exemplar-driven inpainting.
Our approach achieves comparable results to existing single-modal methods.
- Score: 19.800236358666123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, text-to-image denoising diffusion probabilistic models (DDPMs) have
demonstrated impressive image generation capabilities and have also been
successfully applied to image inpainting. However, in practice, users often
require more control over the inpainting process beyond textual guidance,
especially when they want to composite objects with customized appearance,
color, shape, and layout. Unfortunately, existing diffusion-based inpainting
methods are limited to single-modal guidance and require task-specific
training, hindering their cross-modal scalability. To address these
limitations, we propose Uni-paint, a unified framework for multimodal
inpainting that offers various modes of guidance, including unconditional,
text-driven, stroke-driven, exemplar-driven inpainting, as well as a
combination of these modes. Furthermore, our Uni-paint is based on pretrained
Stable Diffusion and does not require task-specific training on specific
datasets, enabling few-shot generalizability to customized images. We have
conducted extensive qualitative and quantitative evaluations that show our
approach achieves comparable results to existing single-modal methods while
offering multimodal inpainting capabilities not available in other methods.
Code will be available at https://github.com/ysy31415/unipaint.
Related papers
- A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - Continuous-Multiple Image Outpainting in One-Step via Positional Query
and A Diffusion-based Approach [104.2588068730834]
This paper pushes the technical frontier of image outpainting in two directions that have not been resolved in literature.
We develop a method that does not depend on a pre-trained backbone network.
We evaluate the proposed approach (called PQDiff) on public benchmarks, demonstrating its superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2024-01-28T13:00:38Z) - Towards Language-Driven Video Inpainting via Multimodal Large Language Models [116.22805434658567]
We introduce a new task -- language-driven video inpainting.
It uses natural language instructions to guide the inpainting process.
We present the Remove Objects from Videos by Instructions dataset.
arXiv Detail & Related papers (2024-01-18T18:59:13Z) - PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like
Interactions [12.792576041526287]
PromptPaint allows users to mix prompts that express challenging concepts.
We characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models.
arXiv Detail & Related papers (2023-08-09T18:41:11Z) - Adaptively-Realistic Image Generation from Stroke and Sketch with
Diffusion Model [31.652827838300915]
We propose a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models.
Our framework achieves state-of-the-art performance while providing flexibility in generating customized images with control over shape, color, and realism.
Our method unleashes applications such as editing on real images, generation with partial sketches and strokes, and multi-domain multi-modal synthesis.
arXiv Detail & Related papers (2022-08-26T13:59:26Z) - In&Out : Diverse Image Outpainting via GAN Inversion [89.84841983778672]
Image outpainting seeks for a semantically consistent extension of the input image beyond its available content.
In this work, we formulate the problem from the perspective of inverting generative adversarial networks.
Our generator renders micro-patches conditioned on their joint latent code as well as their individual positions in the image.
arXiv Detail & Related papers (2021-04-01T17:59:10Z) - Collaboration among Image and Object Level Features for Image
Colourisation [25.60139324272782]
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum.
Previous approaches attacked the problem either by requiring intense user interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image level (context) features.
We propose a single network, named UCapsNet, that separate image-level features obtained through convolutions and object-level features captured by means of capsules.
Then, by skip connections over different layers, we enforce collaboration between such disentangling factors to produce high quality and plausible image colourisation.
arXiv Detail & Related papers (2021-01-19T11:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.