SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
- URL: http://arxiv.org/abs/2404.05717v3
- Date: Thu, 03 Oct 2024 17:56:42 GMT
- Title: SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
- Authors: Jing Gu, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Yilin Wang, Xin Eric Wang,
- Abstract summary: SwapAnything is a novel framework that can swap any objects in an image with personalized concepts given by the reference.
It has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image.
- Score: 51.857176097841915
- License:
- Abstract: Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged. Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image. First, we propose targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping. Then, we introduce appearance adaptation, to seamlessly adapt the semantic concept into the original image in terms of target location, shape, style, and content during the image generation process. Extensive results on both human and automatic evaluation demonstrate significant improvements of our approach over baseline methods on personalized swapping. Furthermore, SwapAnything shows its precise and faithful swapping abilities across single object, multiple objects, partial object, and cross-domain swapping tasks. SwapAnything also achieves great performance on text-based swapping and tasks beyond swapping such as object insertion.
Related papers
- GroundingBooth: Grounding Text-to-Image Customization [17.185571339157075]
We introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects.
Our proposed text-image grounding module and masked cross-attention layer allow us to generate personalized images with both accurate layout alignment and identity preservation.
arXiv Detail & Related papers (2024-09-13T03:40:58Z) - HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness [57.18183962641015]
We present HOI-Swap, a video editing framework trained in a self-supervised manner.
The first stage focuses on object swapping in a single frame with HOI awareness.
The second stage extends the single-frame edit across the entire sequence.
arXiv Detail & Related papers (2024-06-11T22:31:29Z) - Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing [47.421888361871254]
Scene text images contain not only style information (font, background) but also content information (character, texture)
Previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance.
We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability.
arXiv Detail & Related papers (2024-05-07T15:00:11Z) - Customizing Text-to-Image Diffusion with Camera Viewpoint Control [53.621518249820745]
We introduce a new task -- enabling explicit control of camera viewpoint for model customization.
This allows us to modify object properties amongst various background scenes via text prompts.
We propose to condition the 2D diffusion process on rendered, view-dependent features of the new object.
arXiv Detail & Related papers (2024-04-18T16:59:51Z) - CustomNet: Zero-shot Object Customization with Variable-Viewpoints in
Text-to-Image Diffusion Models [85.69959024572363]
CustomNet is a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process.
We introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images.
Our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background.
arXiv Detail & Related papers (2023-10-30T17:50:14Z) - Photoswap: Personalized Subject Swapping in Images [56.2650908740358]
Photoswap learns the visual concept of the subject from reference images and swaps it into the target image using pre-trained diffusion models.
Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality.
arXiv Detail & Related papers (2023-05-29T17:56:13Z) - Highly Personalized Text Embedding for Image Manipulation by Stable
Diffusion [34.662798793560995]
We present a simple yet highly effective approach to personalization using highly personalized (PerHi) text embedding.
Our method does not require model fine-tuning or identifiers, yet still enables manipulation of background, texture, and motion with just a single image and target text.
arXiv Detail & Related papers (2023-03-15T17:07:45Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.