DE-Net: Dynamic Text-guided Image Editing Adversarial Networks
- URL: http://arxiv.org/abs/2206.01160v1
- Date: Thu, 2 Jun 2022 17:20:52 GMT
- Title: DE-Net: Dynamic Text-guided Image Editing Adversarial Networks
- Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian
- Abstract summary: We propose a Dynamic Editing Block (DEBlock) which combines spatial- and channel-wise manipulations dynamically for various editing requirements.
Our DE-Net achieves excellent performance and manipulates source images more effectively and accurately.
- Score: 82.67199573030513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-guided image editing models have shown remarkable results. However,
there remain two problems. First, they employ fixed manipulation modules for
various editing requirements (e.g., color changing, texture changing, content
adding and removing), which result in over-editing or insufficient editing.
Second, they do not clearly distinguish between text-required parts and
text-irrelevant parts, which leads to inaccurate editing. To solve these
limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which combines
spatial- and channel-wise manipulations dynamically for various editing
requirements. (ii) a Combination Weights Predictor (CWP) which predicts the
combination weights for DEBlock according to the inference on text and visual
features. (iii) a Dynamic text-adaptive Convolution Block (DCBlock) which
queries source image features to distinguish text-required parts and
text-irrelevant parts. Extensive experiments demonstrate that our DE-Net
achieves excellent performance and manipulates source images more effectively
and accurately. Code is available at \url{https://github.com/tobran/DE-Net}.
Related papers
- An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions.
It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations.
We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts [119.84478647745658]
TIPEditor is a 3D scene editing framework that accepts both text and image prompts and a 3D bounding box to specify the editing region.
Experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region.
arXiv Detail & Related papers (2024-01-26T12:57:05Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot
Text-based Video Editing [27.661609140918916]
InFusion is a framework for zero-shot text-based video editing.
It supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt.
Our framework is a low-cost alternative to one-shot tuned models for editing since it does not require training.
arXiv Detail & Related papers (2023-07-22T17:05:47Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - Prompt-to-Prompt Image Editing with Cross Attention Control [41.26939787978142]
We present an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only.
We show our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.
arXiv Detail & Related papers (2022-08-02T17:55:41Z) - HairCLIP: Design Your Hair by Text and Reference Image [100.85116679883724]
This paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly.
We encode the image and text conditions in a shared embedding space and propose a unified hair editing framework.
With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing.
arXiv Detail & Related papers (2021-12-09T18:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.