Spatial Content Alignment For Pose Transfer
- URL: http://arxiv.org/abs/2103.16828v1
- Date: Wed, 31 Mar 2021 06:10:29 GMT
- Title: Spatial Content Alignment For Pose Transfer
- Authors: Wing-Yin Yu, Lai-Man Po, Yuzhi Zhao, Jingjing Xiong, Kin-Wai Lau
- Abstract summary: We propose a novel framework to enhance the content consistency of garment textures and the details of human characteristics.
We first alleviate the spatial misalignment by transferring the edge content to the target pose in advance.
Secondly, we introduce a new Content-Style DeBlk which can progressively synthesize photo-realistic person images.
- Score: 13.018067816407923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to unreliable geometric matching and content misalignment, most
conventional pose transfer algorithms fail to generate fine-trained person
images. In this paper, we propose a novel framework Spatial Content Alignment
GAN (SCAGAN) which aims to enhance the content consistency of garment textures
and the details of human characteristics. We first alleviate the spatial
misalignment by transferring the edge content to the target pose in advance.
Secondly, we introduce a new Content-Style DeBlk which can progressively
synthesize photo-realistic person images based on the appearance features of
the source image, the target pose heatmap and the prior transferred content in
edge domain. We compare the proposed framework with several state-of-the-art
methods to show its superiority in quantitative and qualitative analysis.
Moreover, detailed ablation study results demonstrate the efficacy of our
contributions. Codes are publicly available at
github.com/rocketappslab/SCA-GAN.
Related papers
- Spatial-Semantic Collaborative Cropping for User Generated Content [32.490403964193014]
A large amount of User Generated Content (UGC) is uploaded to the Internet daily and displayed to people world-wide.
Previous methods merely consider the aesthetics of the cropped images while ignoring the content integrity, which is crucial for cropping.
We propose a Spatial-Semantic Collaborative cropping network (S2CNet) for arbitrary user generated content accompanied by a new cropping benchmark.
arXiv Detail & Related papers (2024-01-16T03:25:12Z) - Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images.
Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept.
We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z) - Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects.
By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z) - Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - Harnessing the Conditioning Sensorium for Improved Image Translation [2.9631016562930546]
Multi-modal domain translation typically refers to a novel image that inherits certain localized attributes from a 'content' image.
We propose a new approach to learn disentangled 'content' and'style' representations from scratch.
We define 'content' based on conditioning information extracted by off-the-shelf pre-trained models.
We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives.
arXiv Detail & Related papers (2021-10-13T02:07:43Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.