EdiBERT, a generative model for image editing
- URL: http://arxiv.org/abs/2111.15264v1
- Date: Tue, 30 Nov 2021 10:23:06 GMT
- Title: EdiBERT, a generative model for image editing
- Authors: Thibaut Issenhuth, Ugo Tanielian, J\'er\'emie Mary, David Picard
- Abstract summary: EdiBERT is a bi-directional transformer trained in the discrete latent space built by a vector-quantized auto-encoder.
We show that the resulting model matches state-of-the-art performances on a wide variety of tasks.
- Score: 12.605607949417033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in computer vision are pushing the limits of im-age manipulation,
with generative models sampling detailed images on various tasks. However, a
specialized model is often developed and trained for each specific task, even
though many image edition tasks share similarities. In denoising, inpainting,
or image compositing, one always aims at generating a realistic image from a
low-quality one. In this paper, we aim at making a step towards a unified
approach for image editing. To do so, we propose EdiBERT, a bi-directional
transformer trained in the discrete latent space built by a vector-quantized
auto-encoder. We argue that such a bidirectional model is suited for image
manipulation since any patch can be re-sampled conditionally to the whole
image. Using this unique and straightforward training objective, we show that
the resulting model matches state-of-the-art performances on a wide variety of
tasks: image denoising, image completion, and image composition.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection [60.47731445033151]
We propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model.
Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.
arXiv Detail & Related papers (2024-05-27T04:44:36Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - A Method for Training-free Person Image Picture Generation [4.043367784553845]
A Character Image Feature model is proposed in this paper.
It enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation.
The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's or used in combination with Stable Diffusion as a joint model.
arXiv Detail & Related papers (2023-05-16T21:46:28Z) - BlendGAN: Learning and Blending the Internal Distributions of Single
Images by Spatial Image-Identity Conditioning [37.21764919074815]
Single image generative methods are designed to learn the internal patch distribution of a single natural image at multiple scales.
We introduce an extended framework, which allows to simultaneously learn the internal distributions of several images.
Our BlendGAN opens the door to applications that are not supported by single-image models.
arXiv Detail & Related papers (2022-12-03T10:38:27Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation.
The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image.
Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.