End-to-End Visual Editing with a Generatively Pre-Trained Artist
- URL: http://arxiv.org/abs/2205.01668v1
- Date: Tue, 3 May 2022 17:59:30 GMT
- Title: End-to-End Visual Editing with a Generatively Pre-Trained Artist
- Authors: Andrew Brown, Cheng-Yang Fu, Omkar Parkhi, Tamara L. Berg, Andrea
Vedaldi
- Abstract summary: We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.
We propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain.
We show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture.
- Score: 78.5922562526874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the targeted image editing problem: blending a region in a source
image with a driver image that specifies the desired change. Differently from
prior works, we solve this problem by learning a conditional probability
distribution of the edits, end-to-end. Training such a model requires
addressing a fundamental technical challenge: the lack of example edits for
training. To this end, we propose a self-supervised approach that simulates
edits by augmenting off-the-shelf images in a target domain. The benefits are
remarkable: implemented as a state-of-the-art auto-regressive transformer, our
approach is simple, sidesteps difficulties with previous methods based on
GAN-like priors, obtains significantly better edits, and is efficient.
Furthermore, we show that different blending effects can be learned by an
intuitive control of the augmentation process, with no other changes required
to the model architecture. We demonstrate the superiority of this approach
across several datasets in extensive quantitative and qualitative experiments,
including human studies, significantly outperforming prior work.
Related papers
- LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing [20.861672583434718]
LIPE is a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject.
We present LIPE, a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject, and subsequently employ the model with learned prior for non-rigid image editing.
arXiv Detail & Related papers (2024-06-25T02:56:16Z) - A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - Customize Your Own Paired Data via Few-shot Way [14.193031218059646]
Some supervised methods require huge amounts of paired training data, which greatly limits their usages.
The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and behaving badly in out-of-distribution cases.
In our proposed framework, a novel few-shot learning mechanism based on the directional transformations among samples is introduced and expands the learnable space exponentially.
arXiv Detail & Related papers (2024-05-21T04:21:35Z) - Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks.
We provide an exhaustive overview of existing methods using diffusion models for image editing.
To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z) - Paint by Example: Exemplar-based Image Editing with Diffusion Models [35.84464684227222]
In this paper, we investigate exemplar-guided image editing for more precise control.
We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar.
We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.
arXiv Detail & Related papers (2022-11-23T18:59:52Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Learning by Planning: Language-Guided Global Image Editing [53.72807421111136]
We develop a text-to-operation model to map the vague editing language request into a series of editing operations.
The only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions.
We propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.
arXiv Detail & Related papers (2021-06-24T16:30:03Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.