SpecRef: A Fast Training-free Baseline of Specific Reference-Condition
Real Image Editing
- URL: http://arxiv.org/abs/2401.03433v1
- Date: Sun, 7 Jan 2024 09:23:06 GMT
- Title: SpecRef: A Fast Training-free Baseline of Specific Reference-Condition
Real Image Editing
- Authors: Songyan Chen, Jiancheng Huang
- Abstract summary: We propose a new task called Specific Reference Condition Real Image Editing.
It allows user to provide a reference image to further control the outcome, such as replacing an object with a particular one.
Specifically, we design a Specific Reference Attention Controller to incorporate features from the reference image, and adopt a mask mechanism to prevent interference between editing and non-editing regions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-conditional image editing based on large diffusion generative model has
attracted the attention of both the industry and the research community. Most
existing methods are non-reference editing, with the user only able to provide
a source image and text prompt. However, it restricts user's control over the
characteristics of editing outcome. To increase user freedom, we propose a new
task called Specific Reference Condition Real Image Editing, which allows user
to provide a reference image to further control the outcome, such as replacing
an object with a particular one. To accomplish this, we propose a fast baseline
method named SpecRef. Specifically, we design a Specific Reference Attention
Controller to incorporate features from the reference image, and adopt a mask
mechanism to prevent interference between editing and non-editing regions. We
evaluate SpecRef on typical editing tasks and show that it can achieve
satisfactory performance. The source code is available on
https://github.com/jingjiqinggong/specp2p.
Related papers
- FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction [31.95664918050255]
FreeEdit is a novel approach for achieving reference-based image editing.
It can accurately reproduce the visual concept from the reference image based on user-friendly language instructions.
arXiv Detail & Related papers (2024-09-26T17:18:39Z) - Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently.
We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame.
We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models [26.92450293675906]
Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts.
We propose Custom-Edit, in which we (i) customize a diffusion model with a few reference images and then (ii) perform text-guided editing.
arXiv Detail & Related papers (2023-05-25T06:46:28Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image.
We show that having control over the properties of each object in an image leads to comprehensive editing capabilities.
Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.