GTAutoAct: An Automatic Datasets Generation Framework Based on Game
Engine Redevelopment for Action Recognition
- URL: http://arxiv.org/abs/2401.13414v1
- Date: Wed, 24 Jan 2024 12:18:31 GMT
- Title: GTAutoAct: An Automatic Datasets Generation Framework Based on Game
Engine Redevelopment for Action Recognition
- Authors: Xingyu Song, Zhan Li, Shi Chen and Kazuyuki Demachi
- Abstract summary: GTAutoAct is a novel dataset generation framework leveraging game engine technology to facilitate advancements in action recognition.
It transforms coordinate-based 3D human motion into rotation-orientated representation with enhanced suitability in multiple viewpoints.
It implements an autonomous video capture and processing pipeline, featuring a randomly navigating camera, with auto-trimming and labeling functionalities.
- Score: 12.521014978532548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current datasets for action recognition tasks face limitations stemming from
traditional collection and generation methods, including the constrained range
of action classes, absence of multi-viewpoint recordings, limited diversity,
poor video quality, and labor-intensive manually collection. To address these
challenges, we introduce GTAutoAct, a innovative dataset generation framework
leveraging game engine technology to facilitate advancements in action
recognition. GTAutoAct excels in automatically creating large-scale,
well-annotated datasets with extensive action classes and superior video
quality. Our framework's distinctive contributions encompass: (1) it
innovatively transforms readily available coordinate-based 3D human motion into
rotation-orientated representation with enhanced suitability in multiple
viewpoints; (2) it employs dynamic segmentation and interpolation of rotation
sequences to create smooth and realistic animations of action; (3) it offers
extensively customizable animation scenes; (4) it implements an autonomous
video capture and processing pipeline, featuring a randomly navigating camera,
with auto-trimming and labeling functionalities. Experimental results
underscore the framework's robustness and highlights its potential to
significantly improve action recognition model training.
Related papers
- CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving [25.156989992025625]
We introduce a novel spatial adaptive generation framework, CoGen, to achieve controllable multi-view videos with high 3D consistency.
By replacing coarse 2D conditions with fine-grained 3D representations, our approach significantly enhances the spatial consistency of the generated videos.
Results demonstrate that this method excels in preserving geometric fidelity and visual realism, offering a reliable video generation solution for autonomous driving.
arXiv Detail & Related papers (2025-03-28T08:27:05Z) - ObjectMover: Generative Object Movement with Video Prior [69.75281888309017]
We present ObjectMover, a generative model that can perform object movement in challenging scenes.
We show that with this approach, our model is able to adjust to complex real-world scenarios.
We propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization.
arXiv Detail & Related papers (2025-03-11T04:42:59Z) - Leader and Follower: Interactive Motion Generation under Trajectory Constraints [42.90788442575116]
This paper explores the motion range refinement process in interactive motion generation.
It proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter.
Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.
arXiv Detail & Related papers (2025-02-17T08:52:45Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.
To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.
Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor.
Our key insight is that large video generation models can act as both neurals and implicit physics simulators'', having learned interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z) - Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling [19.205142489726875]
Video activity recognition has become increasingly important in robots and embodied AI.
We introduce a novel system, CARS, to overcome these issues through adaptive video context modeling.
Our CARS runs at speeds $>$30 FPS on typical edge devices and outperforms all baselines by 1.2% to 79.7% in accuracy.
arXiv Detail & Related papers (2024-10-19T05:50:00Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video [11.293897932762809]
Action recognition, an essential component of computer vision, plays a pivotal role in multiple applications.
CNNs suffer performance declines when trained with discontinuous video frames, which is a frequent scenario in real-world settings.
To overcome this issue, we introduce the 4A pipeline, which employs a series of sophisticated techniques.
arXiv Detail & Related papers (2024-04-10T04:59:51Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios [17.94374027261511]
We propose a framework to synthesize realistic hand gestures using Unreal Engine.
Our framework offers customization options and reduces the risk of overfitting.
By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.
arXiv Detail & Related papers (2023-09-08T16:32:56Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Action-conditioned On-demand Motion Generation [11.45641608124365]
We propose a novel framework, On-Demand MOtion Generation (ODMO), for generating realistic and diverse long-term 3D human motion sequences.
ODMO shows improvements over SOTA approaches on all traditional motion evaluation metrics when evaluated on three public datasets.
arXiv Detail & Related papers (2022-07-17T13:04:44Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.