UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation
- URL: http://arxiv.org/abs/2309.14335v1
- Date: Mon, 25 Sep 2023 17:58:46 GMT
- Title: UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation
- Authors: Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Wayne Wu, Ziwei
Liu
- Abstract summary: A holistic human dataset inevitably has insufficient and low-resolution information on local parts.
We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
- Score: 59.77275587857252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human generation has achieved significant progress. Nonetheless, existing
methods still struggle to synthesize specific regions such as faces and hands.
We argue that the main reason is rooted in the training data. A holistic human
dataset inevitably has insufficient and low-resolution information on local
parts. Therefore, we propose to use multi-source datasets with various
resolution images to jointly learn a high-resolution human generative model.
However, multi-source data inherently a) contains different parts that do not
spatially align into a coherent human, and b) comes with different scales. To
tackle these challenges, we propose an end-to-end framework, UnitedHuman, that
empowers continuous GAN with the ability to effectively utilize multi-source
data for high-resolution human generation. Specifically, 1) we design a
Multi-Source Spatial Transformer that spatially aligns multi-source images to
full-body space with a human parametric model. 2) Next, a continuous GAN is
proposed with global-structural guidance and CutMix consistency. Patches from
different datasets are then sampled and transformed to supervise the training
of this scale-invariant generative model. Extensive experiments demonstrate
that our model jointly learned from multi-source data achieves superior quality
than those learned from a holistic dataset.
Related papers
- Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction [10.646376827353551]
Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management.
Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources.
We introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels.
arXiv Detail & Related papers (2024-06-30T16:13:13Z) - SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs [6.879945062426145]
We generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs.
We demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation.
arXiv Detail & Related papers (2024-06-28T01:14:43Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - StyleGAN-Human: A Data-Centric Odyssey of Human Generation [96.7080874757475]
This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering"
We collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures.
We rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment.
arXiv Detail & Related papers (2022-04-25T17:55:08Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Ensembles of GANs for synthetic training data generation [7.835101177261939]
Insufficient training data is a major bottleneck for most deep learning practices.
This work investigates the use of synthetic images, created by generative adversarial networks (GANs), as the only source of training data.
arXiv Detail & Related papers (2021-04-23T19:38:48Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.