FLAG3D: A 3D Fitness Activity Dataset with Language Instruction
- URL: http://arxiv.org/abs/2212.04638v2
- Date: Wed, 19 Apr 2023 13:31:03 GMT
- Title: FLAG3D: A 3D Fitness Activity Dataset with Language Instruction
- Authors: Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming
Rao, Jiwen Lu, Jie Zhou, Xiu Li
- Abstract summary: We present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories.
We show that FLAG3D contributes great research value for various challenges, such as cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation.
- Score: 89.60371681477791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the continuously thriving popularity around the world, fitness activity
analytic has become an emerging research topic in computer vision. While a
variety of new tasks and algorithms have been proposed recently, there are
growing hunger for data resources involved in high-quality data, fine-grained
labels, and diverse environments. In this paper, we present FLAG3D, a
large-scale 3D fitness activity dataset with language instruction containing
180K sequences of 60 categories. FLAG3D features the following three aspects:
1) accurate and dense 3D human pose captured from advanced MoCap system to
handle the complex activity and large movement, 2) detailed and professional
language instruction to describe how to perform a specific activity, 3)
versatile video resources from a high-tech MoCap system, rendering software,
and cost-effective smartphones in natural environments. Extensive experiments
and in-depth analysis show that FLAG3D contributes great research value for
various challenges, such as cross-domain human action recognition, dynamic
human mesh recovery, and language-guided human action generation. Our dataset
and source code are publicly available at https://andytang15.github.io/FLAG3D.
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts [30.571811801090224]
We introduce a comprehensive 3D instructionfollowing dataset called M3DBench.
It supports general multimodal instructions interleaved with text, images, 3D objects, and other visual prompts.
It unifies diverse 3D tasks at both region and scene levels, covering a variety of fundamental abilities in real-world 3D environments.
arXiv Detail & Related papers (2023-12-17T16:53:30Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language [31.691159120136064]
We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data.
We present a novel method, dubbed WildRefer, for this task by fully utilizing the rich appearance information in images, the position and geometric clues in point cloud.
Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots.
arXiv Detail & Related papers (2023-04-12T06:48:26Z) - PresSim: An End-to-end Framework for Dynamic Ground Pressure Profile
Generation from Monocular Videos Using Physics-based 3D Simulation [8.107762252448195]
Ground pressure exerted by the human body is a valuable source of information for human activity recognition (HAR) in pervasive sensing.
We present a novel end-to-end framework, PresSim, to synthesize sensor data from videos of human activities to reduce such effort significantly.
arXiv Detail & Related papers (2023-02-01T12:02:04Z) - Objaverse: A Universe of Annotated 3D Objects [53.2537614157313]
We present averse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive tags, captions and animations.
We demonstrate the large potential of averse 3D models via four applications: training diverse 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied vision models, and creating a new benchmark for robustness analysis of vision models.
arXiv Detail & Related papers (2022-12-15T18:56:53Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z) - AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs
in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet.
The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames.
The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.