Towards Controllable and Photorealistic Region-wise Image Manipulation
- URL: http://arxiv.org/abs/2108.08674v1
- Date: Thu, 19 Aug 2021 13:29:45 GMT
- Title: Towards Controllable and Photorealistic Region-wise Image Manipulation
- Authors: Ansheng You, Chenglin Zhou, Qixuan Zhang, Lan Xu
- Abstract summary: We present a generative model with auto-encoder architecture for per-region style manipulation.
We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations.
The model is constrained by a content alignment loss to ensure the foreground editing will not interfere background contents.
- Score: 11.601157452472714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptive and flexible image editing is a desirable function of modern
generative models. In this work, we present a generative model with
auto-encoder architecture for per-region style manipulation. We apply a code
consistency loss to enforce an explicit disentanglement between content and
style latent representations, making the content and style of generated samples
consistent with their corresponding content and style references. The model is
also constrained by a content alignment loss to ensure the foreground editing
will not interfere background contents. As a result, given interested region
masks provided by users, our model supports foreground region-wise style
transfer. Specially, our model receives no extra annotations such as semantic
labels except for self-supervision. Extensive experiments show the
effectiveness of the proposed method and exhibit the flexibility of the
proposed model for various applications, including region-wise style editing,
latent space interpolation, cross-domain style transfer.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control [43.96257216397601]
We propose a new plug-and-play solution for training-free personalization of diffusion models.
RB-Modulation is built on a novel optimal controller where a style descriptor encodes the desired attributes.
Cross-attention-based feature aggregation scheme allows RB-Modulation to decouple content and style from the reference image.
arXiv Detail & Related papers (2024-05-27T17:51:08Z) - Zero-shot Inversion Process for Image Attribute Editing with Diffusion
Models [9.924851219904843]
We propose a framework that injects a fusion of generated visual reference and text guidance into the semantic latent space of a pre-trained diffusion model.
Only using a tiny neural network, the proposed ZIP produces diverse content and attributes under the intuitive control of the text prompt.
Compared to state-of-the-art methods, ZIP produces images of equivalent quality while providing a realistic editing effect.
arXiv Detail & Related papers (2023-08-30T08:40:15Z) - MODIFY: Model-driven Face Stylization without Style Images [77.24793103549158]
Existing face stylization methods always acquire the presence of the target (style) domain during the translation process.
We propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images.
Experimental results on several different datasets validate the effectiveness of MODIFY for unsupervised face stylization.
arXiv Detail & Related papers (2023-03-17T08:35:17Z) - Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
Image Manipulation [27.587905673112473]
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions.
Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion.
We explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet.
arXiv Detail & Related papers (2022-10-12T02:21:18Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z) - Manifold Alignment for Semantically Aligned Style Transfer [61.1274057338588]
We make a new assumption that image features from the same semantic region form a manifold and an image with multiple semantic regions follows a multi-manifold distribution.
Based on this assumption, the style transfer problem is formulated as aligning two multi-manifold distributions.
The proposed framework allows semantically similar regions between the output and the style image share similar style patterns.
arXiv Detail & Related papers (2020-05-21T16:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.