UniHCP: A Unified Model for Human-Centric Perceptions
- URL: http://arxiv.org/abs/2303.02936v4
- Date: Thu, 22 Jun 2023 05:17:53 GMT
- Title: UniHCP: A Unified Model for Human-Centric Perceptions
- Authors: Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng
Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang
- Abstract summary: We propose a Unified Model for Human-Centric Perceptions (UniHCP)
UniHCP unifies a wide range of human-centric tasks in a simplified end-to-end manner with the plain vision transformer architecture.
With large-scale joint training on 33 human-centric datasets, UniHCP can outperform strong baselines by direct evaluation.
- Score: 75.38263862084641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian
detection, person re-identification, etc.) play a key role in industrial
applications of visual models. While specific human-centric tasks have their
own relevant semantic aspect to focus on, they also share the same underlying
semantic structure of the human body. However, few works have attempted to
exploit such homogeneity and design a general-propose model for human-centric
tasks. In this work, we revisit a broad range of human-centric tasks and unify
them in a minimalist manner. We propose UniHCP, a Unified Model for
Human-Centric Perceptions, which unifies a wide range of human-centric tasks in
a simplified end-to-end manner with the plain vision transformer architecture.
With large-scale joint training on 33 human-centric datasets, UniHCP can
outperform strong baselines on several in-domain and downstream tasks by direct
evaluation. When adapted to a specific task, UniHCP achieves new SOTAs on a
wide range of human-centric tasks, e.g., 69.8 mIoU on CIHP for human parsing,
86.18 mA on PA-100K for attribute prediction, 90.3 mAP on Market1501 for ReID,
and 85.8 JI on CrowdHuman for pedestrian detection, performing better than
specialized models tailored for each task.
Related papers
- Sapiens: Foundation for Human Vision Models [14.72839332332364]
We present Sapiens, a family of models for four fundamental human-centric vision tasks.
Our models support 1K high-resolution inference and are easy to adapt for individual tasks.
We observe that self-supervised pretraining on a curated dataset of human images significantly boosts the performance for a diverse set of human-centric tasks.
arXiv Detail & Related papers (2024-08-22T17:37:27Z) - HINT: Learning Complete Human Neural Representations from Limited Viewpoints [69.76947323932107]
We propose a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles.
As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR.
arXiv Detail & Related papers (2024-05-30T05:43:09Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Expressive Forecasting of 3D Whole-body Human Motions [38.93700642077312]
We are the first to formulate a whole-body human pose forecasting framework.
Our model involves two key constituents: cross-context alignment (XCA) and cross-context interaction (XCI)
We conduct extensive experiments on a newly-introduced large-scale benchmark and achieve state-of-theart performance.
arXiv Detail & Related papers (2023-12-19T09:09:46Z) - You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception [37.667147915777534]
Human-centric perception is a long-standing problem for computer vision.
This paper introduces a unified and versatile framework (HQNet) for single-stage multi-person multi-task human-centric perception (HCP)
Human Query captures intricate instance-level features for individual persons and disentangles complex multi-person scenarios.
arXiv Detail & Related papers (2023-12-09T10:36:43Z) - Hulk: A Universal Knowledge Translator for Human-Centric Tasks [69.8518392427151]
We present Hulk, the first multimodal human-centric generalist model.
It addresses 2D vision, 3D vision, skeleton-based, and vision-language tasks without task-specific finetuning.
Hulk achieves state-of-the-art performance in 11 benchmarks.
arXiv Detail & Related papers (2023-12-04T07:36:04Z) - HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception [97.55089867970874]
We introduce masked image modeling (MIM) as a pre-training approach for this task.
Motivated by this insight, we incorporate an intuitive human structure prior - human parts - into pre-training.
This encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks.
arXiv Detail & Related papers (2023-10-31T17:56:11Z) - Whole-Body Human Pose Estimation in the Wild [88.09875133989155]
COCO-WholeBody extends COCO dataset with whole-body annotations.
It is the first benchmark that has manual annotations on the entire human body.
A single-network model, named ZoomNet, is devised to take into account the hierarchical structure of the full human body.
arXiv Detail & Related papers (2020-07-23T08:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.