Magic Insert: Style-Aware Drag-and-Drop
- URL: http://arxiv.org/abs/2407.02489v1
- Date: Tue, 2 Jul 2024 17:59:50 GMT
- Title: Magic Insert: Style-Aware Drag-and-Drop
- Authors: Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Michael Rubinstein, David E. Jacobs, Shlomi Fruchter,
- Abstract summary: We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style.
For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image.
For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles.
- Score: 28.101564123298882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image. This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object insertion in stylized images. For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image, and then infuses it with a CLIP representation of the target style. For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles. Overall, the method significantly outperforms traditional approaches such as inpainting. Finally, we present a dataset, SubjectPlop, to facilitate evaluation and future progress in this area. Project page: https://magicinsert.github.io/
Related papers
- Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics [3.9717825324709413]
Style has been primarily considered in terms of artistic elements such as colors, brushstrokes, and lighting.
In this study, we propose a zero-shot scheme for image variation with coordinated semantics.
arXiv Detail & Related papers (2024-10-24T08:34:57Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Soulstyler: Using Large Language Model to Guide Image Style Transfer for
Target Object [9.759321877363258]
"Soulstyler" allows users to guide the stylization of specific objects in an image through simple textual descriptions.
We introduce a large language model to parse the text and identify stylization goals and specific styles.
We also introduce a novel localized text-image block matching loss that ensures that style transfer is performed only on specified target objects.
arXiv Detail & Related papers (2023-11-22T18:15:43Z) - MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP [0.0]
Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image.
We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC) that can apply styles to different objects in the image based on the context extracted from the input prompt.
Our method can extend to any arbitrary objects, styles and produce high-quality images compared to the current state of art methods.
arXiv Detail & Related papers (2023-09-24T18:24:55Z) - Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate [58.83278629019384]
Style transfer aims to render the style of a given image for style reference to another given image for content reference.
Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.
We propose Any-to-Any Style Transfer, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions.
arXiv Detail & Related papers (2023-04-19T15:15:36Z) - DSI2I: Dense Style for Unpaired Image-to-Image Translation [70.93865212275412]
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar.
We propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information.
Our results show that the translations produced by our approach are more diverse, preserve the source content better, and are closer to the exemplars when compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-26T18:45:25Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Imagic: Text-Based Real Image Editing with Diffusion Models [19.05825157237432]
We demonstrate the ability to apply complex (e.g., non-rigid) text-guided semantic edits to a single real image.
Our proposed method requires only a single input image and a target text.
It operates on real images, and does not require any additional inputs.
arXiv Detail & Related papers (2022-10-17T17:27:32Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Interactive Style Transfer: All is Your Palette [74.06681967115594]
We propose a drawing-like interactive style transfer (IST) method, by which users can interactively create a harmonious-style image.
Our IST method can serve as a brush, dip style from anywhere, and then paint to any region of the target content image.
arXiv Detail & Related papers (2022-03-25T06:38:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.