Related papers: Human Image Generation: A Comprehensive Survey

Human Image Generation: A Comprehensive Survey

URL: http://arxiv.org/abs/2212.08896v3
Date: Fri, 24 May 2024 03:33:47 GMT
Title: Human Image Generation: A Comprehensive Survey
Authors: Zhen Jia, Zhang Zhang, Liang Wang, Tieniu Tan,
Abstract summary: In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. The advantages and characteristics of different methods are summarized in terms of model architectures. Due to the wide application potentials, the typical downstream usages of synthesized human images are covered.
Score: 44.204029557298476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each paradigm, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures. Besides, the main public human image datasets and evaluation metrics in the literature are summarized. Furthermore, due to the wide application potentials, the typical downstream usages of synthesized human images are covered. Finally, the challenges and potential opportunities of human image generation are discussed to shed light on future research.

Related papers

Human-Centric Foundation Models: Perception, Generation and Agentic Modeling [79.97999901785772]
Human-centric Foundation Models unify diverse human-centric tasks into a single framework. We present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups. This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling.
arXiv Detail & Related papers (2025-02-12T16:38:40Z)
Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations [7.448124739584319]
We propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis. Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation. Our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
arXiv Detail & Related papers (2024-12-04T04:02:17Z)
Exploring Social Media Image Categorization Using Large Models with Different Adaptation Methods: A Case Study on Cultural Nature's Contributions to People [1.7736307382785161]
Social media images provide valuable insights for modeling, mapping, and understanding human interactions with natural and cultural heritage.<n> categorizing these images into semantically meaningful groups remains highly complex due to the vast diversity and heterogeneity of their visual content.<n>We introduce FLIPS a dataset of Flickr images that capture the interaction between human and nature.<n>We evaluate various solutions based on different types and combinations of large models using various adaptation methods.
arXiv Detail & Related papers (2024-09-30T23:04:55Z)
Single Image, Any Face: Generalisable 3D Face Generation [59.9369171926757]
We propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input. To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images.
arXiv Detail & Related papers (2024-09-25T14:56:37Z)
Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape. We collect 35K trials of behavioral data from over 500 participants. We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z)
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors [47.62426718293504]
HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image. HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
arXiv Detail & Related papers (2024-06-18T10:05:33Z)
Multi Positive Contrastive Learning with Pose-Consistent Generated Images [0.873811641236639]
We propose the generation of visually distinct images with identical human poses. We then propose a novel multi-positive contrastive learning, which optimally utilize the previously generated images. Despite using only less than 1% amount of data compared to current state-of-the-art method, GenPoCCL captures structural features of the human body more effectively.
arXiv Detail & Related papers (2024-04-04T07:26:26Z)
Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection. Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z)
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network. Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z)
Limitations of Face Image Generation [12.11955119100926]
We study the efficacy and shortcomings of generative models in the context of face generation. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. We present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
arXiv Detail & Related papers (2023-09-13T19:33:26Z)
Image Synthesis with Adversarial Networks: a Comprehensive Survey and Case Studies [41.00383742615389]
Generative Adversarial Networks (GANs) have been extremely successful in various application domains such as computer vision, medicine, and natural language processing. GANs are powerful models for learning complex distributions to synthesize semantically meaningful samples. Given the current fast GANs development, in this survey, we provide a comprehensive review of adversarial models for image synthesis.
arXiv Detail & Related papers (2020-12-26T13:30:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.