Human-Centric Foundation Models: Perception, Generation and Agentic Modeling
- URL: http://arxiv.org/abs/2502.08556v1
- Date: Wed, 12 Feb 2025 16:38:40 GMT
- Title: Human-Centric Foundation Models: Perception, Generation and Agentic Modeling
- Authors: Shixiang Tang, Yizhou Wang, Lu Chen, Yuan Wang, Sida Peng, Dan Xu, Wanli Ouyang,
- Abstract summary: Human-centric Foundation Models unify diverse human-centric tasks into a single framework.
We present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups.
This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling.
- Score: 79.97999901785772
- License:
- Abstract: Human understanding and generation are critical for modeling digital humans and humanoid embodiments. Recently, Human-centric Foundation Models (HcFMs) inspired by the success of generalist models, such as large language and vision models, have emerged to unify diverse human-centric tasks into a single framework, surpassing traditional task-specific approaches. In this survey, we present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups: (1) Human-centric Perception Foundation Models that capture fine-grained features for multi-modal 2D and 3D understanding. (2) Human-centric AIGC Foundation Models that generate high-fidelity, diverse human-related content. (3) Unified Perception and Generation Models that integrate these capabilities to enhance both human understanding and synthesis. (4) Human-centric Agentic Foundation Models that extend beyond perception and generation to learn human-like intelligence and interactive behaviors for humanoid embodied tasks. We review state-of-the-art techniques, discuss emerging challenges and future research directions. This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling.
Related papers
- Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations [7.448124739584319]
We propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis.
Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation.
Our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
arXiv Detail & Related papers (2024-12-04T04:02:17Z) - Human Modelling and Pose Estimation Overview [0.0]
Human modelling and pose estimation stands at the crossroads of Computer Vision, Computer Graphics, and Machine Learning.
This paper presents a thorough investigation of this interdisciplinary field, examining various algorithms, methodologies, and practical applications.
arXiv Detail & Related papers (2024-06-27T16:04:41Z) - Human Factors in Model-Driven Engineering: Future Research Goals and Initiatives for MDE [15.661925949062843]
We discuss topics related to human factors in modelling during a GI-Dagstuhl seminar.
Five topics were covered in depth, namely modelling human aspects, factors of modeller experience, diversity and inclusion in MDE, collaboration and MDE, and teaching human-aware MDE.
arXiv Detail & Related papers (2024-04-29T13:27:20Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Trends, Applications, and Challenges in Human Attention Modelling [65.61554471033844]
Human attention modelling has proven to be particularly useful for understanding the cognitive processes underlying visual exploration.
It provides support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling.
arXiv Detail & Related papers (2024-02-28T19:35:30Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - Human Image Generation: A Comprehensive Survey [44.204029557298476]
In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods.
The advantages and characteristics of different methods are summarized in terms of model architectures.
Due to the wide application potentials, the typical downstream usages of synthesized human images are covered.
arXiv Detail & Related papers (2022-12-17T15:19:45Z) - StyleGAN-Human: A Data-Centric Odyssey of Human Generation [96.7080874757475]
This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering"
We collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures.
We rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment.
arXiv Detail & Related papers (2022-04-25T17:55:08Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.