Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection
- URL: http://arxiv.org/abs/2405.15939v1
- Date: Fri, 24 May 2024 21:08:27 GMT
- Title: Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection
- Authors: Yi-Ting Shen, Hyungtae Lee, Heesung Kwon, Shuvra S. Bhattacharyya,
- Abstract summary: We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection.
Our method constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses.
Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified dataset in training presents remarkably better accuracy.
- Score: 16.42439177494448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection. Our method firstly constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses while maintaining the original style using an image translator. Since images corresponding to the novel poses are not available in training, the image translator is trained to be applicable only when the input and target poses are similar, thus training does not require the novel poses and their corresponding images. Next, we select a sequence of target novel poses from the novel pose set, using Dijkstra's algorithm to ensure that poses closer to each other are located adjacently in the sequence. Finally, we repeatedly apply the image translator to each target pose in sequence to produce a group of novel pose images representing a variety of different limited body movements from the source pose. Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified synthetic dataset in training generally presents remarkably better accuracy than using the original synthetic dataset on three aerial-view human detection benchmarks (VisDrone, Okutama-Action, and ICG) in the few-shot regime.
Related papers
- Controllable Human Image Generation with Personalized Multi-Garments [46.042383679103125]
BootComp is a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments.
We propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs.
We show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain.
arXiv Detail & Related papers (2024-11-25T12:37:13Z) - GRPose: Learning Graph Relations for Human Image Generation with Pose Priors [21.971188335727074]
We propose a framework delving into the graph relations of pose priors to provide control information for human image generation.
Our model achieves superior performance, with a 9.98% increase in pose average precision compared to the latest benchmark model.
arXiv Detail & Related papers (2024-08-29T13:58:34Z) - Novel View Synthesis of Humans using Differentiable Rendering [50.57718384229912]
We present a new approach for synthesizing novel views of people in new poses.
Our synthesis makes use of diffuse Gaussian primitives that represent the underlying skeletal structure of a human.
Rendering these primitives gives results in a high-dimensional latent image, which is then transformed into an RGB image by a decoder network.
arXiv Detail & Related papers (2023-03-28T10:48:33Z) - TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose
Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data.
We learn to predict realistic texture of objects from real image collections.
We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z) - TIPS: Text-Induced Pose Synthesis [24.317541784957285]
In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose.
We first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues.
The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.
arXiv Detail & Related papers (2022-07-24T11:14:46Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing.
For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing.
To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z) - Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses.
Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene.
We design a compositional neural network that predicts the silhouette, garment labels, and textures.
We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z) - Adversarial Synthesis of Human Pose from Text [18.02001711736337]
This work focuses on synthesizing human poses from human-level text descriptions.
We propose a model that is based on a conditional generative adversarial network.
We show through qualitative and quantitative results that the model is capable of synthesizing plausible poses matching the given text.
arXiv Detail & Related papers (2020-05-01T12:32:04Z) - Pose Manipulation with Identity Preservation [0.0]
We introduce Character Adaptive Identity Normalization GAN (CainGAN) which uses spatial characteristic features extracted by an embedder and combined across source images.
CainGAN receives figures of faces from a certain individual and produces new ones while preserving the person's identity.
Experimental results show that the quality of generated images scales with the size of the input set used during inference.
arXiv Detail & Related papers (2020-04-20T09:51:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.