Action Recognition Utilizing YGAR Dataset
- URL: http://arxiv.org/abs/2310.00831v1
- Date: Mon, 2 Oct 2023 00:43:45 GMT
- Title: Action Recognition Utilizing YGAR Dataset
- Authors: Shuo Wang, Amiya Ranjan and Lawrence Jiang
- Abstract summary: The scarcity of high quality actions video data is a bottleneck in the research and application of action recognition.
We present a new 3D actions data simulation engine and generate 3 sets of sample data to demonstrate its current functionalities.
- Score: 5.922172844641853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The scarcity of high quality actions video data is a bottleneck in the
research and application of action recognition. Although significant effort has
been made in this area, there still exist gaps in the range of available data
types a more flexible and comprehensive data set could help bridge. In this
paper, we present a new 3D actions data simulation engine and generate 3 sets
of sample data to demonstrate its current functionalities. With the new data
generation process, we demonstrate its applications to image classifications,
action recognitions and potential to evolve into a system that would allow the
exploration of much more complex action recognition tasks. In order to show off
these capabilities, we also train and test a list of commonly used models for
image recognition to demonstrate the potential applications and capabilities of
the data sets and their generation process.
Related papers
- Skarimva: Skeleton-based Action Recognition is a Multi-view Application [44.79834103607383]
This work demonstrates that by making use of multiple camera views to triangulate more accurate 3Dskeletons, the performance of state-of-the-art action recognition models can be improved significantly.
arXiv Detail & Related papers (2026-02-26T17:10:58Z) - Automated Image Recognition Framework [14.338537127280402]
We propose a novel Automated Image Recognition framework that harnesses the power of generative AI.<n>AIR empowers end-users to synthesize high-quality, pre-annotated datasets.<n>It also automatically trains deep learning models on the generated datasets with robust image recognition performance.
arXiv Detail & Related papers (2025-06-24T02:42:34Z) - Language Supervised Human Action Recognition with Salient Fusion: Construction Worker Action Recognition as a Use Case [8.26451988845854]
We introduce a novel approach to Human Action Recognition (HAR) based on skeleton and visual cues.
We employ learnable prompts for the language model conditioned on the skeleton modality to optimize feature representation.
We introduce a new dataset tailored for real-world robotic applications in construction sites, featuring visual, skeleton, and depth data modalities.
arXiv Detail & Related papers (2024-10-02T19:10:23Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - 3D objects and scenes classification, recognition, segmentation, and
reconstruction using 3D point cloud data: A review [5.85206759397617]
Three-dimensional (3D) point cloud analysis has become one of the attractive subjects in realistic imaging and machine visions.
A significant effort has recently been devoted to developing novel strategies, using different techniques such as deep learning models.
Various tasks performed on 3D point could data are investigated, including objects and scenes detection, recognition, segmentation and reconstruction.
arXiv Detail & Related papers (2023-06-09T15:45:23Z) - SGED: A Benchmark dataset for Performance Evaluation of Spiking Gesture
Emotion Recognition [12.396844568607522]
We label a new homogeneous multimodal gesture emotion recognition dataset based on the analysis of the existing data sets.
We propose a pseudo dual-flow network based on this dataset, and verify the application potential of this dataset in the affective computing community.
arXiv Detail & Related papers (2023-04-28T09:32:09Z) - FLAG3D: A 3D Fitness Activity Dataset with Language Instruction [89.60371681477791]
We present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories.
We show that FLAG3D contributes great research value for various challenges, such as cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation.
arXiv Detail & Related papers (2022-12-09T02:33:33Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Spatial-Temporal Alignment Network for Action Recognition and Detection [80.19235282200697]
This paper studies how to introduce viewpoint-invariant feature representations that can help action recognition and detection.
We propose a novel Spatial-Temporal Alignment Network (STAN) that aims to learn geometric invariant representations for action recognition and action detection.
We test our STAN model extensively on AVA, Kinetics-400, AVA-Kinetics, Charades, and Charades-Ego datasets.
arXiv Detail & Related papers (2020-12-04T06:23:40Z) - DeepActsNet: Spatial and Motion features from Face, Hands, and Body
Combined with Convolutional and Graph Networks for Improved Action
Recognition [10.690794159983199]
We present "Deep Action Stamps (DeepActs)", a novel data representation to encode actions from video sequences.
We also present "DeepActsNet", a deep learning based ensemble model which learns convolutional and structural features from Deep Action Stamps for highly accurate action recognition.
arXiv Detail & Related papers (2020-09-21T12:41:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.