Spatial-temporal Transformer-guided Diffusion based Data Augmentation
for Efficient Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2302.13434v2
- Date: Tue, 25 Jul 2023 02:24:04 GMT
- Title: Spatial-temporal Transformer-guided Diffusion based Data Augmentation
for Efficient Skeleton-based Action Recognition
- Authors: Yifan Jiang, Han Chen, Hanseok Ko
- Abstract summary: We introduce a novel data augmentation method for skeleton-based action recognition tasks.
Our method outperforms the state-of-the-art (SOTA) motion generation approaches on different naturality and diversity metrics.
- Score: 32.07659338674024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, skeleton-based human action has become a hot research topic because
the compact representation of human skeletons brings new blood to this research
domain. As a result, researchers began to notice the importance of using RGB or
other sensors to analyze human action by extracting skeleton information.
Leveraging the rapid development of deep learning (DL), a significant number of
skeleton-based human action approaches have been presented with fine-designed
DL structures recently. However, a well-trained DL model always demands
high-quality and sufficient data, which is hard to obtain without costing high
expenses and human labor. In this paper, we introduce a novel data augmentation
method for skeleton-based action recognition tasks, which can effectively
generate high-quality and diverse sequential actions. In order to obtain
natural and realistic action sequences, we propose denoising diffusion
probabilistic models (DDPMs) that can generate a series of synthetic action
sequences, and their generation process is precisely guided by a
spatial-temporal transformer (ST-Trans). Experimental results show that our
method outperforms the state-of-the-art (SOTA) motion generation approaches on
different naturality and diversity metrics. It proves that its high-quality
synthetic data can also be effectively deployed to existing action recognition
models with significant performance improvement.
Related papers
- Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis [31.082402451716973]
Human motion synthesis aims to generate plausible human motion sequences.<n>Recent score-based generative models (SGMs) have demonstrated impressive results on this task.<n>We propose a Deterministic-to-Stochastic Diverse Latent Feature Mapping (DSDFM) method for human motion synthesis.
arXiv Detail & Related papers (2025-05-02T04:48:28Z) - Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data [6.318463500874778]
We propose a prototype-guided diffusion model to generate high-fidelity synthetic pathology data at scale.
Our approach ensures biologically and diagnostically meaningful variations in the generated data.
We demonstrate that self-supervised features trained on our synthetic dataset achieve competitive performance despite using 60x-760x less data than models trained on large real-world datasets.
arXiv Detail & Related papers (2025-04-15T21:17:39Z) - Enhancing Activity Recognition After Stroke: Generative Adversarial Networks for Kinematic Data Augmentation [0.0]
Generalizability of machine learning models for wearable monitoring in stroke rehabilitation is often constrained by the limited scale and heterogeneity of available data.
Data augmentation addresses this challenge by adding computationally derived data to real data to enrich the variability represented in the training set.
This study employs Conditional Generative Adversarial Networks (cGANs) to create synthetic kinematic data from a publicly available dataset.
By training deep learning models on both synthetic and experimental data, we enhanced task classification accuracy: models incorporating synthetic data attained an overall accuracy of 80.0%, significantly higher than the 66.1% seen in models trained solely with real data
arXiv Detail & Related papers (2024-06-12T15:51:00Z) - MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints [50.61346764110482]
We integrate a musculoskeletal system with a learnable parametric hand model, MANO, to create MS-MANO.
This model emulates the dynamics of muscles and tendons to drive the skeletal system, imposing physiologically realistic constraints on the resulting torque trajectories.
We also propose a simulation-in-the-loop pose refinement framework, BioPR, that refines the initial estimated pose through a multi-layer perceptron network.
arXiv Detail & Related papers (2024-04-16T02:18:18Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion
Models for Enhanced Skin Disease Classification using ViT and CNN [1.0499611180329804]
We aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning.
We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models.
arXiv Detail & Related papers (2024-01-10T13:46:03Z) - Learning Latent Dynamics via Invariant Decomposition and
(Spatio-)Temporal Transformers [0.6767885381740952]
We propose a method for learning dynamical systems from high-dimensional empirical data.
We focus on the setting in which data are available from multiple different instances of a system.
We study behaviour through simple theoretical analyses and extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-21T07:52:07Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Spatio-Temporal Human Action Recognition Modelwith Flexible-interval
Sampling and Normalization [0.0]
We propose a human action system for Red-Green-Blue(RGB) video input with our own designed module.
We build a novel dataset with a similar background and discriminative actions for both human keypoint prediction and behavior recognition.
Experimental results demonstrate the effectiveness of the proposed model on our own human behavior recognition dataset and some public datasets.
arXiv Detail & Related papers (2021-08-12T10:02:20Z) - The Imaginative Generative Adversarial Network: Automatic Data
Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action
Recognition [27.795763107984286]
We present a novel automatic data augmentation model, which approximates the distribution of the input data and samples new data from this distribution.
Our results show that the augmentation strategy is fast to train and can improve classification accuracy for both neural networks and state-of-the-art methods.
arXiv Detail & Related papers (2021-05-27T11:07:09Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.