Related papers: Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection

Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection

URL: http://arxiv.org/abs/2405.15939v1
Date: Fri, 24 May 2024 21:08:27 GMT
Title: Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection
Authors: Yi-Ting Shen, Hyungtae Lee, Heesung Kwon, Shuvra S. Bhattacharyya,
Abstract summary: We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection. Our method constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses. Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified dataset in training presents remarkably better accuracy.
Score: 16.42439177494448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection. Our method firstly constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses while maintaining the original style using an image translator. Since images corresponding to the novel poses are not available in training, the image translator is trained to be applicable only when the input and target poses are similar, thus training does not require the novel poses and their corresponding images. Next, we select a sequence of target novel poses from the novel pose set, using Dijkstra's algorithm to ensure that poses closer to each other are located adjacently in the sequence. Finally, we repeatedly apply the image translator to each target pose in sequence to produce a group of novel pose images representing a variety of different limited body movements from the source pose. Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified synthetic dataset in training generally presents remarkably better accuracy than using the original synthetic dataset on three aerial-view human detection benchmarks (VisDrone, Okutama-Action, and ICG) in the few-shot regime.

Related papers

SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [72.26350984924129]
We propose a latent space generation paradigm for 3D human digitization.<n>We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift.<n>We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
arXiv Detail & Related papers (2025-04-09T15:38:18Z)
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data [1.264462543503282]
PoseSyn is a novel data synthesis framework that transforms abundant in the wild 2D pose dataset into diverse 3D pose image pairs.<n>By generating realistic 3D training data via a human animation model aligned with challenging poses and appearances PoseSyn boosts the accuracy of various 3D pose estimators by up to 14%.
arXiv Detail & Related papers (2025-03-17T10:28:35Z)
Controllable Human Image Generation with Personalized Multi-Garments [46.042383679103125]
BootComp is a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments. We propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs. We show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain.
arXiv Detail & Related papers (2024-11-25T12:37:13Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.<n>Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors [21.971188335727074]
We propose a framework delving into the graph relations of pose priors to provide control information for human image generation. Our model achieves superior performance, with a 9.98% increase in pose average precision compared to the latest benchmark model.
arXiv Detail & Related papers (2024-08-29T13:58:34Z)
StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset [56.71580976007712]
We propose to use the Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation. Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posterior distribution of human-object spatial relations from the image. During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples.
arXiv Detail & Related papers (2024-07-30T04:57:21Z)
Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision. Inertial sensor has been introduced to provide complementary source of information. It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z)
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors [17.3834029178939]
This paper introduces a novel human pose estimation approach using sparse inertial sensors. It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization. The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19% on the DIP-IMU dataset.
arXiv Detail & Related papers (2023-12-02T13:17:10Z)
SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling [93.60731530276911]
We introduce a new synthetic dataset, SynBody, with three appealing features. The dataset comprises 1.2M images with corresponding accurate 3D annotations, covering 10,000 human body models, 1,187 actions, and various viewpoints.
arXiv Detail & Related papers (2023-03-30T13:30:12Z)
Novel View Synthesis of Humans using Differentiable Rendering [50.57718384229912]
We present a new approach for synthesizing novel views of people in new poses. Our synthesis makes use of diffuse Gaussian primitives that represent the underlying skeletal structure of a human. Rendering these primitives gives results in a high-dimensional latent image, which is then transformed into an RGB image by a decoder network.
arXiv Detail & Related papers (2023-03-28T10:48:33Z)
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data. We learn to predict realistic texture of objects from real image collections. We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
TIPS: Text-Induced Pose Synthesis [24.317541784957285]
In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose. We first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.
arXiv Detail & Related papers (2022-07-24T11:14:46Z)
Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input. Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z)
Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis. We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z)
PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing. For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing. To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z)
Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses. Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene. We design a compositional neural network that predicts the silhouette, garment labels, and textures. We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z)
Methodology for Building Synthetic Datasets with Virtual Humans [1.5556923898855324]
Large datasets can be used for improved, targeted training of deep neural networks. In particular, we make use of a 3D morphable face model for the rendering of multiple 2D images across a dataset of 100 synthetic identities.
arXiv Detail & Related papers (2020-06-21T10:29:36Z)
Cascaded deep monocular 3D human pose estimation with evolutionary training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation. This paper proposes a novel data augmentation method that is scalable for massive amount of training data. Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
Adversarial Synthesis of Human Pose from Text [18.02001711736337]
This work focuses on synthesizing human poses from human-level text descriptions. We propose a model that is based on a conditional generative adversarial network. We show through qualitative and quantitative results that the model is capable of synthesizing plausible poses matching the given text.
arXiv Detail & Related papers (2020-05-01T12:32:04Z)
Pose Manipulation with Identity Preservation [0.0]
We introduce Character Adaptive Identity Normalization GAN (CainGAN) which uses spatial characteristic features extracted by an embedder and combined across source images. CainGAN receives figures of faces from a certain individual and produces new ones while preserving the person's identity. Experimental results show that the quality of generated images scales with the size of the input set used during inference.
arXiv Detail & Related papers (2020-04-20T09:51:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.