Related papers: TIPS: Text-Induced Pose Synthesis

TIPS: Text-Induced Pose Synthesis

URL: http://arxiv.org/abs/2207.11718v1
Date: Sun, 24 Jul 2022 11:14:46 GMT
Title: TIPS: Text-Induced Pose Synthesis
Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein
Abstract summary: In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose. We first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.
Score: 24.317541784957285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying process challenging to apply in real-world scenarios as the generation of the target image is the actual aim. In this paper, we first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. We divide the problem into three independent stages: (a) text to pose representation, (b) pose refinement, and (c) pose rendering. To the best of our knowledge, this is one of the first attempts to develop a text-based pose transfer framework where we also introduce a new dataset DF-PASS, by adding descriptive pose annotations for the images of the DeepFashion dataset. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.

Related papers

Recovering Partially Corrupted Major Objects through Tri-modality Based Image Completion [13.846868357952419]
Diffusion models have become widely adopted in image completion tasks. A persistent challenge arises when an object is partially obscured in the damaged region, yet its remaining parts are still visible in the background. We propose supplementing text-based guidance with a novel visual aid: a casual sketch. This sketch supplies critical structural cues, enabling the generative model to produce an object structure that seamlessly integrates with the existing background.
arXiv Detail & Related papers (2025-03-10T08:34:31Z)
Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection [16.42439177494448]
We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection. Our method constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses. Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified dataset in training presents remarkably better accuracy.
arXiv Detail & Related papers (2024-05-24T21:08:27Z)
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [11.798006331912056]
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions. We propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts.
arXiv Detail & Related papers (2023-07-18T08:23:46Z)
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data. We learn to predict realistic texture of objects from real image collections. We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z)
HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on. We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z)
Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis. We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z)
Learning Semantic Person Image Generation by Region-Adaptive Normalization [81.52223606284443]
We propose a new two-stage framework to handle the pose and appearance translation. In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer. In the second stage, we suggest a new person image generation method by incorporating the region-adaptive normalization.
arXiv Detail & Related papers (2021-04-14T06:51:37Z)
Spatial Content Alignment For Pose Transfer [13.018067816407923]
We propose a novel framework to enhance the content consistency of garment textures and the details of human characteristics. We first alleviate the spatial misalignment by transferring the edge content to the target pose in advance. Secondly, we introduce a new Content-Style DeBlk which can progressively synthesize photo-realistic person images.
arXiv Detail & Related papers (2021-03-31T06:10:29Z)
Progressive and Aligned Pose Attention Transfer for Person Image Generation [59.87492938953545]
This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. We use two types of blocks, namely Pose-Attentional Transfer Block (PATB) and Aligned Pose-Attentional Transfer Bloc (APATB) We verify the efficacy of the model on the Market-1501 and DeepFashion datasets, using quantitative and qualitative measures.
arXiv Detail & Related papers (2021-03-22T07:24:57Z)
Wish You Were Here: Context-Aware Human Generation [100.51309746913512]
We present a novel method for inserting objects, specifically humans, into existing images. Our method involves threeworks: the first generates the semantic map of the new person, given the pose of the other persons in the scene. The second network renders the pixels of the novel person and its blending mask, based on specifications in the form of multiple appearance components. A third network refines the generated face in order to match those of the target person.
arXiv Detail & Related papers (2020-05-21T14:09:14Z)
Sequential View Synthesis with Transformer [13.200139959163574]
We introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations. We evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning.
arXiv Detail & Related papers (2020-04-09T14:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.