Photoswap: Personalized Subject Swapping in Images
- URL: http://arxiv.org/abs/2305.18286v1
- Date: Mon, 29 May 2023 17:56:13 GMT
- Title: Photoswap: Personalized Subject Swapping in Images
- Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu,
Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang
- Abstract summary: Photoswap learns the visual concept of the subject from reference images and swaps it into the target image using pre-trained diffusion models.
Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality.
- Score: 56.2650908740358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In an era where images and visual content dominate our digital landscape, the
ability to manipulate and personalize these images has become a necessity.
Envision seamlessly substituting a tabby cat lounging on a sunlit window sill
in a photograph with your own playful puppy, all while preserving the original
charm and composition of the image. We present Photoswap, a novel approach that
enables this immersive image editing experience through personalized subject
swapping in existing images. Photoswap first learns the visual concept of the
subject from reference images and then swaps it into the target image using
pre-trained diffusion models in a training-free manner. We establish that a
well-conceptualized visual subject can be seamlessly transferred to any image
with appropriate self-attention and cross-attention manipulation, maintaining
the pose of the swapped subject and the overall coherence of the image.
Comprehensive experiments underscore the efficacy and controllability of
Photoswap in personalized subject swapping. Furthermore, Photoswap
significantly outperforms baseline methods in human ratings across subject
swapping, background preservation, and overall quality, revealing its vast
application potential, from entertainment to professional editing.
Related papers
- SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing [51.857176097841915]
SwapAnything is a novel framework that can swap any objects in an image with personalized concepts given by the reference.
It has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image.
arXiv Detail & Related papers (2024-04-08T17:52:29Z) - Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images.
Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept.
We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - 3D GAN Inversion for Controllable Portrait Image Animation [45.55581298551192]
We leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency.
The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer.
arXiv Detail & Related papers (2022-03-25T04:06:06Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.