GTAutoAct: An Automatic Datasets Generation Framework Based on Game
Engine Redevelopment for Action Recognition
- URL: http://arxiv.org/abs/2401.13414v1
- Date: Wed, 24 Jan 2024 12:18:31 GMT
- Title: GTAutoAct: An Automatic Datasets Generation Framework Based on Game
Engine Redevelopment for Action Recognition
- Authors: Xingyu Song, Zhan Li, Shi Chen and Kazuyuki Demachi
- Abstract summary: GTAutoAct is a novel dataset generation framework leveraging game engine technology to facilitate advancements in action recognition.
It transforms coordinate-based 3D human motion into rotation-orientated representation with enhanced suitability in multiple viewpoints.
It implements an autonomous video capture and processing pipeline, featuring a randomly navigating camera, with auto-trimming and labeling functionalities.
- Score: 12.521014978532548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current datasets for action recognition tasks face limitations stemming from
traditional collection and generation methods, including the constrained range
of action classes, absence of multi-viewpoint recordings, limited diversity,
poor video quality, and labor-intensive manually collection. To address these
challenges, we introduce GTAutoAct, a innovative dataset generation framework
leveraging game engine technology to facilitate advancements in action
recognition. GTAutoAct excels in automatically creating large-scale,
well-annotated datasets with extensive action classes and superior video
quality. Our framework's distinctive contributions encompass: (1) it
innovatively transforms readily available coordinate-based 3D human motion into
rotation-orientated representation with enhanced suitability in multiple
viewpoints; (2) it employs dynamic segmentation and interpolation of rotation
sequences to create smooth and realistic animations of action; (3) it offers
extensively customizable animation scenes; (4) it implements an autonomous
video capture and processing pipeline, featuring a randomly navigating camera,
with auto-trimming and labeling functionalities. Experimental results
underscore the framework's robustness and highlights its potential to
significantly improve action recognition model training.
Related papers
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video [11.293897932762809]
Action recognition, an essential component of computer vision, plays a pivotal role in multiple applications.
CNNs suffer performance declines when trained with discontinuous video frames, which is a frequent scenario in real-world settings.
To overcome this issue, we introduce the 4A pipeline, which employs a series of sophisticated techniques.
arXiv Detail & Related papers (2024-04-10T04:59:51Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios [17.94374027261511]
We propose a framework to synthesize realistic hand gestures using Unreal Engine.
Our framework offers customization options and reduces the risk of overfitting.
By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.
arXiv Detail & Related papers (2023-09-08T16:32:56Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Action-conditioned On-demand Motion Generation [11.45641608124365]
We propose a novel framework, On-Demand MOtion Generation (ODMO), for generating realistic and diverse long-term 3D human motion sequences.
ODMO shows improvements over SOTA approaches on all traditional motion evaluation metrics when evaluated on three public datasets.
arXiv Detail & Related papers (2022-07-17T13:04:44Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.