UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
- URL: http://arxiv.org/abs/2412.15216v1
- Date: Thu, 19 Dec 2024 18:59:58 GMT
- Title: UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
- Authors: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari,
- Abstract summary: We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training.
Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency ( CEC)
CEC applies forward and backward edits in one training step and enforces consistency in image and attention spaces.
- Score: 69.33072075580483
- License:
- Abstract: We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training. Existing supervised methods depend on datasets containing triplets of input image, edited image, and edit instruction. These are generated by either existing editing methods or human-annotations, which introduce biases and limit their generalization ability. Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency (CEC), which applies forward and backward edits in one training step and enforces consistency in image and attention spaces. This allows us to bypass the need for ground-truth edited images and unlock training for the first time on datasets comprising either real image-caption pairs or image-caption-edit triplets. We empirically show that our unsupervised technique performs better across a broader range of edits with high fidelity and precision. By eliminating the need for pre-existing datasets of triplets, reducing biases associated with supervised methods, and proposing CEC, our work represents a significant advancement in unblocking scaling of instruction-based image editing.
Related papers
- Edicho: Consistent Image Editing in the Wild [90.42395533938915]
Edicho steps in with a training-free solution based on diffusion models.
It features a fundamental design principle of using explicit image correspondence to direct editing.
arXiv Detail & Related papers (2024-12-30T16:56:44Z) - AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset.
We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.
Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z) - ControlEdit: A MultiModal Local Clothing Image Editing Method [3.6604114810930946]
Multimodal clothing image editing refers to the precise adjustment and modification of clothing images using data such as textual descriptions and visual images as control conditions.
We propose a new image editing method ControlEdit, which transfers clothing image editing to multimodal-guided local inpainting of clothing images.
arXiv Detail & Related papers (2024-09-23T05:34:59Z) - InstructBrush: Learning Attention-based Instruction Optimization for Image Editing [54.07526261513434]
InstructBrush is an inversion method for instruction-based image editing methods.
It extracts editing effects from image pairs as editing instructions, which are further applied for image editing.
Our approach achieves superior performance in editing and is more semantically consistent with the target editing effects.
arXiv Detail & Related papers (2024-03-27T15:03:38Z) - Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image
Guidance [15.130419159003816]
We present a versatile image editing framework capable of executing both rigid and non-rigid edits.
We leverage a dual-path injection scheme to handle diverse editing scenarios.
We introduce an integrated self-attention mechanism for fusion of appearance and structural information.
arXiv Detail & Related papers (2024-01-04T08:21:30Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z) - Learning to Follow Object-Centric Image Editing Instructions Faithfully [26.69032113274608]
Current approaches focusing on image editing with natural language instructions rely on automatically generated paired data.
We significantly improve the quality of the paired data and enhance the supervision signal.
Our model is capable of performing fine-grained object-centric edits better than state-of-the-art baselines.
arXiv Detail & Related papers (2023-10-29T20:39:11Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.