Human-Art: A Versatile Human-Centric Dataset Bridging Natural and
Artificial Scenes
- URL: http://arxiv.org/abs/2303.02760v2
- Date: Wed, 5 Apr 2023 07:36:48 GMT
- Title: Human-Art: A Versatile Human-Centric Dataset Bridging Natural and
Artificial Scenes
- Authors: Xuan Ju, Ailing Zeng, Jianan Wang, Qiang Xu, Lei Zhang
- Abstract summary: We introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios.
Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios.
We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer.
- Score: 15.48297730981114
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humans have long been recorded in a variety of forms since antiquity. For
example, sculptures and paintings were the primary media for depicting human
beings before the invention of cameras. However, most current human-centric
computer vision tasks like human pose estimation and human image generation
focus exclusively on natural images in the real world. Artificial humans, such
as those in sculptures, paintings, and cartoons, are commonly neglected, making
existing models fail in these scenarios. As an abstraction of life, art
incorporates humans in both natural and artificial scenes. We take advantage of
it and introduce the Human-Art dataset to bridge related tasks in natural and
artificial scenarios. Specifically, Human-Art contains 50k high-quality images
with over 123k person instances from 5 natural and 15 artificial scenarios,
which are annotated with bounding boxes, keypoints, self-contact points, and
text information for humans represented in both 2D and 3D. It is, therefore,
comprehensive and versatile for various downstream tasks. We also provide a
rich set of baseline results and detailed analyses for related tasks, including
human detection, 2D and 3D human pose estimation, image generation, and motion
transfer. As a challenging dataset, we hope Human-Art can provide insights for
relevant research and open up new research questions.
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - A Unified Framework for Human-centric Point Cloud Video Understanding [23.91555808792291]
Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds.
We propose a unified framework to make full use of the prior knowledge and explore the inherent features in the data itself for generalized human-centric point cloud video understanding.
Our method achieves state-of-the-art performance on various human-related tasks, including action recognition and 3D pose estimation.
arXiv Detail & Related papers (2024-03-29T07:53:06Z) - UniHuman: A Unified Model for Editing Human Images in the Wild [49.896715833075106]
We propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.
To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders.
In user studies, UniHuman is preferred by the users in an average of 77% of cases.
arXiv Detail & Related papers (2023-12-22T05:00:30Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - PixelHuman: Animatable Neural Radiance Fields from Few Images [27.932366091437103]
We propose PixelHuman, a novel rendering model that generates animatable human scenes from a few images of a person.
Our method differs from existing methods in that it can generalize to any input image for animatable human synthesis.
Our experiments show that our method achieves state-of-the-art performance in multiview and novel pose synthesis from few-shot images.
arXiv Detail & Related papers (2023-07-18T08:41:17Z) - DELAUNAY: a dataset of abstract art for psychophysical and machine
learning research [0.0]
We introduce DELAUNAY, a dataset of abstract paintings and non-figurative art objects labelled by the artists' names.
This dataset provides a middle ground between natural images and artificial patterns and can thus be used in a variety of contexts.
We train an off-the-shelf convolutional neural network on DELAUNAY, highlighting several of its intriguing features.
arXiv Detail & Related papers (2022-01-28T13:57:32Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - Comparing Visual Reasoning in Humans and AI [66.89451296340809]
We created a dataset of complex scenes that contained human behaviors and social interactions.
We used a quantitative metric of similarity between scene descriptions of the AI/human and ground truth of five other human descriptions of each scene.
Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes.
arXiv Detail & Related papers (2021-04-29T04:44:13Z) - Holistic 3D Human and Scene Mesh Estimation from Single View Images [5.100152971410397]
We propose an end-to-end trainable model that perceives the 3D scene from a single RGB image.
We show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods.
arXiv Detail & Related papers (2020-12-02T23:22:03Z) - Long-term Human Motion Prediction with Scene Context [60.096118270451974]
We propose a novel three-stage framework for predicting human motion.
Our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.
arXiv Detail & Related papers (2020-07-07T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.