Learning Feature-Preserving Portrait Editing from Generated Pairs
        - URL: http://arxiv.org/abs/2407.20455v1
- Date: Mon, 29 Jul 2024 23:19:42 GMT
- Title: Learning Feature-Preserving Portrait Editing from Generated Pairs
- Authors: Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo, 
- Abstract summary: We propose a training-based method leveraging auto-generated paired data to learn desired editing.
Our method achieves state-of-the-art quality, quantitatively and qualitatively.
- Score: 11.122956539965761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired editing at low cost. Based on these pairs, we introduce a Multi-Conditioned Diffusion Model to effectively learn the editing direction and preserve subject features. During inference, our model produces accurate editing mask that can guide the inference process to further preserve detailed subject features. Experiments on costume editing and cartoon expression editing show that our method achieves state-of-the-art quality, quantitatively and qualitatively. 
 
      
        Related papers
        - S$^2$Edit: Text-Guided Image Editing with Precise Semantic and Spatial   Control [29.031157601804953]
 S$2$Edit is a text-to-image diffusion model that enables personalized editing with precise semantic and spatial control.<n>We show that S$2$Edit performs localized editing while faithfully preserving the original identity with semantically disentangled and spatially focused identity token learned.
 arXiv  Detail & Related papers  (2025-07-07T00:14:08Z)
- CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image   Editing [24.68304617869157]
 Context-Preserving Adaptive Manipulation (CPAM) is a novel framework for complicated, non-rigid real image editing.<n>We develop a preservation adaptation module that adjusts self-attention mechanisms to preserve and independently control the object and background effectively.<n>We also introduce various mask-guidance strategies to facilitate diverse image manipulation tasks in a simple manner.
 arXiv  Detail & Related papers  (2025-06-23T09:19:38Z)
- Beyond Editing Pairs: Fine-Grained Instructional Image Editing via   Multi-Scale Learnable Regions [20.617718631292696]
 We develop a novel paradigm for instruction-driven image editing that leverages widely available and enormous text-image pairs.<n>Our approach introduces a multi-scale learnable region to localize and guide the editing process.<n>By treating the alignment between images and their textual descriptions as supervision and learning to generate task-specific editing regions, our method achieves high-fidelity, precise, and instruction-consistent image editing.
 arXiv  Detail & Related papers  (2025-05-25T22:40:59Z)
- PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
 We present the first text-based image editing approach for object parts based on pre-trained diffusion models.
Our approach is preferred by users 77-90% of the time in conducted user studies.
 arXiv  Detail & Related papers  (2025-02-06T13:08:43Z)
- IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion [12.494492016414503]
 Existing models encounter challenges such as poor editing quality, high computational costs and difficulties in preserving facial identity across diverse edits.
We propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models.
Our approach significantly reduces editing time by 80%, while maintaining temporal consistency throughout the video sequence.
 arXiv  Detail & Related papers  (2025-01-13T18:08:27Z)
- UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit   Consistency [69.33072075580483]
 We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training.
Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency ( CEC)
 CEC applies forward and backward edits in one training step and enforces consistency in image and attention spaces.
 arXiv  Detail & Related papers  (2024-12-19T18:59:58Z)
- INRetouch: Context Aware Implicit Neural Representation for Photography   Retouching [54.17599183365242]
 We propose a novel retouch transfer approach that learns from professional edits through before-after image pairs.
We develop a context-aware Implicit Neural Representation that learns to apply edits adaptively based on image content and context.
Our method extracts implicit transformations from reference edits and adaptively applies them to new images.
 arXiv  Detail & Related papers  (2024-12-05T03:31:48Z)
- Pathways on the Image Manifold: Image Editing via Video Generation [11.891831122571995]
 We reformulate image editing as a temporal process, using pretrained video models to create smooth transitions from the original image to the desired edit.
Our approach achieves state-of-the-art results on text-based image editing, demonstrating significant improvements in both edit accuracy and image preservation.
 arXiv  Detail & Related papers  (2024-11-25T16:41:45Z)
- AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
 We present AnyEdit, a comprehensive multi-modal instruction editing dataset.
We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.
Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
 arXiv  Detail & Related papers  (2024-11-24T07:02:56Z)
- Learning Action and Reasoning-Centric Image Editing from Videos and   Simulations [45.637947364341436]
 AURORA dataset is a collection of high-quality training data, human-annotated and curated from videos and simulation engines.
We evaluate an AURORA-finetuned model on a new expert-curated benchmark covering 8 diverse editing tasks.
Our model significantly outperforms previous editing models as judged by human raters.
 arXiv  Detail & Related papers  (2024-07-03T19:36:33Z)
- LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing [20.861672583434718]
 LIPE is a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject.
We present LIPE, a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject, and subsequently employ the model with learned prior for non-rigid image editing.
 arXiv  Detail & Related papers  (2024-06-25T02:56:16Z)
- Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
  Local-Global Iterative Training [61.984277261016146]
 We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
 arXiv  Detail & Related papers  (2023-12-04T06:25:06Z)
- Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
 We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
 arXiv  Detail & Related papers  (2023-11-28T15:31:11Z)
- Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
 We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.
We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks.
We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
 arXiv  Detail & Related papers  (2023-11-16T18:55:58Z)
- Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
 We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
 arXiv  Detail & Related papers  (2023-10-18T17:59:02Z)
- StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [86.92711729969488]
 We exploit the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions, and unexpected changes in nonselected regions.
 arXiv  Detail & Related papers  (2023-03-28T00:16:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.