PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data
- URL: http://arxiv.org/abs/2502.14397v1
- Date: Thu, 20 Feb 2025 09:35:38 GMT
- Title: PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data
- Authors: Shijie Huang, Yiren Song, Yuxuan Zhang, Hailong Guo, Xueyin Wang, Mike Zheng Shou, Jiaming Liu,
- Abstract summary: Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background.
The proposed method, PhotoDoodle, employs a two-stage training strategy.
To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism.
- Score: 24.08203111413198
- License:
- Abstract: We introduce PhotoDoodle, a novel image editing framework designed to facilitate photo doodling by enabling artists to overlay decorative elements onto photographs. Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background, requiring realistic blending, perspective alignment, and contextual coherence. Additionally, the background must be preserved without distortion, and the artist's unique style must be captured efficiently from limited training data. These requirements are not addressed by previous methods that primarily focus on global style transfer or regional inpainting. The proposed method, PhotoDoodle, employs a two-stage training strategy. Initially, we train a general-purpose image editing model, OmniEditor, using large-scale data. Subsequently, we fine-tune this model with EditLoRA using a small, artist-curated dataset of before-and-after image pairs to capture distinct editing styles and techniques. To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism. Additionally, we release a PhotoDoodle dataset featuring six high-quality styles. Extensive experiments demonstrate the advanced performance and robustness of our method in customized image editing, opening new possibilities for artistic creation.
Related papers
- UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency [69.33072075580483]
We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training.
Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency ( CEC)
CEC applies forward and backward edits in one training step and enforces consistency in image and attention spaces.
arXiv Detail & Related papers (2024-12-19T18:59:58Z) - MuseumMaker: Continual Style Customization without Catastrophic Forgetting [50.12727620780213]
We propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner.
When facing with a new customization style, we develop a style distillation loss module to extract and learn the styles of the training data for new image generation.
It can minimize the learning biases caused by content of new training images, and address the catastrophic overfitting issue induced by few-shot images.
arXiv Detail & Related papers (2024-04-25T13:51:38Z) - StyleBooth: Image Style Editing with Multimodal Instruction [17.251982243534144]
Given an original image, image editing aims to generate an image that align with the provided instruction.
In this paper, we focus on image style editing and present StyleBooth, a method that proposes a comprehensive framework for image editing.
By iterative style-destyle tuning and editing and usability filtering, the StyleBooth dataset provides content-consistent stylized/plain image pairs.
arXiv Detail & Related papers (2024-04-18T12:58:55Z) - Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt [12.27693060663517]
Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images.
We propose a novel pre-trained diffusion-based artistic style transfer method, called LSAST.
Our proposed method can generate more highly realistic artistic stylized images than the state-of-the-art artistic style transfer methods.
arXiv Detail & Related papers (2024-04-17T15:28:53Z) - Portrait Diffusion: Training-free Face Stylization with
Chain-of-Painting [64.43760427752532]
Face stylization refers to the transformation of a face into a specific portrait style.
Current methods require the use of example-based adaptation approaches to fine-tune pre-trained generative models.
This paper proposes a training-free face stylization framework, named Portrait Diffusion.
arXiv Detail & Related papers (2023-12-03T06:48:35Z) - Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing
with Pre-Trained Diffusion Model [22.975965453227477]
We introduce a new framework called textitPaste, Inpaint and Harmonize via Denoising (PhD)
In our experiments, we apply PhD to both subject-driven image editing tasks and explore text-driven scene generation given a reference subject.
arXiv Detail & Related papers (2023-06-13T07:43:10Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Unsupervised Scene Sketch to Photo Synthesis [40.044690369936184]
We present a method for synthesizing realistic photos from scene sketches.
Our framework learns from readily available large-scale photo datasets in an unsupervised manner.
We also demonstrate that our framework facilitates a controllable manipulation of photo synthesis by editing strokes of corresponding sketches.
arXiv Detail & Related papers (2022-09-06T22:25:06Z) - Deep Plastic Surgery: Robust and Controllable Image Editing with
Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches.
Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.