GTAutoAct: An Automatic Datasets Generation Framework Based on Game
Engine Redevelopment for Action Recognition
- URL: http://arxiv.org/abs/2401.13414v1
- Date: Wed, 24 Jan 2024 12:18:31 GMT
- Title: GTAutoAct: An Automatic Datasets Generation Framework Based on Game
Engine Redevelopment for Action Recognition
- Authors: Xingyu Song, Zhan Li, Shi Chen and Kazuyuki Demachi
- Abstract summary: GTAutoAct is a novel dataset generation framework leveraging game engine technology to facilitate advancements in action recognition.
It transforms coordinate-based 3D human motion into rotation-orientated representation with enhanced suitability in multiple viewpoints.
It implements an autonomous video capture and processing pipeline, featuring a randomly navigating camera, with auto-trimming and labeling functionalities.
- Score: 12.521014978532548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current datasets for action recognition tasks face limitations stemming from
traditional collection and generation methods, including the constrained range
of action classes, absence of multi-viewpoint recordings, limited diversity,
poor video quality, and labor-intensive manually collection. To address these
challenges, we introduce GTAutoAct, a innovative dataset generation framework
leveraging game engine technology to facilitate advancements in action
recognition. GTAutoAct excels in automatically creating large-scale,
well-annotated datasets with extensive action classes and superior video
quality. Our framework's distinctive contributions encompass: (1) it
innovatively transforms readily available coordinate-based 3D human motion into
rotation-orientated representation with enhanced suitability in multiple
viewpoints; (2) it employs dynamic segmentation and interpolation of rotation
sequences to create smooth and realistic animations of action; (3) it offers
extensively customizable animation scenes; (4) it implements an autonomous
video capture and processing pipeline, featuring a randomly navigating camera,
with auto-trimming and labeling functionalities. Experimental results
underscore the framework's robustness and highlights its potential to
significantly improve action recognition model training.
Related papers
- Leader and Follower: Interactive Motion Generation under Trajectory Constraints [42.90788442575116]
This paper explores the motion range refinement process in interactive motion generation.
It proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter.
Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.
arXiv Detail & Related papers (2025-02-17T08:52:45Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.
To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.
Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor.
Our key insight is that large video foundation models can act as both neurals and implicit physics simulators by learning interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z) - Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention [61.3281618482513]
We present CogDriving, a novel network designed for synthesizing high-quality multi-view driving videos.
CogDriving leverages a Diffusion Transformer architecture with holistic-4D attention modules, enabling simultaneous associations across the dimensions.
CogDriving demonstrates strong performance on the nuScenes validation set, achieving an FVD score of 37.8, highlighting its ability to generate realistic driving videos.
arXiv Detail & Related papers (2024-12-04T18:02:49Z) - An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video [11.293897932762809]
Action recognition, an essential component of computer vision, plays a pivotal role in multiple applications.
CNNs suffer performance declines when trained with discontinuous video frames, which is a frequent scenario in real-world settings.
To overcome this issue, we introduce the 4A pipeline, which employs a series of sophisticated techniques.
arXiv Detail & Related papers (2024-04-10T04:59:51Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Action-conditioned On-demand Motion Generation [11.45641608124365]
We propose a novel framework, On-Demand MOtion Generation (ODMO), for generating realistic and diverse long-term 3D human motion sequences.
ODMO shows improvements over SOTA approaches on all traditional motion evaluation metrics when evaluated on three public datasets.
arXiv Detail & Related papers (2022-07-17T13:04:44Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.