Learning Discriminative Representations for Skeleton Based Action
Recognition
- URL: http://arxiv.org/abs/2303.03729v3
- Date: Tue, 28 Mar 2023 03:17:05 GMT
- Title: Learning Discriminative Representations for Skeleton Based Action
Recognition
- Authors: Huanyu Zhou, Qingjie Liu, Yunhong Wang
- Abstract summary: We propose an auxiliary feature refinement head (FR Head) to obtain discriminative representations of skeletons.
Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples.
- Score: 49.45405879193866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human action recognition aims at classifying the category of human action
from a segment of a video. Recently, people have dived into designing GCN-based
models to extract features from skeletons for performing this task, because
skeleton representations are much more efficient and robust than other
modalities such as RGB frames. However, when employing the skeleton data, some
important clues like related items are also discarded. It results in some
ambiguous actions that are hard to be distinguished and tend to be
misclassified. To alleviate this problem, we propose an auxiliary feature
refinement head (FR Head), which consists of spatial-temporal decoupling and
contrastive feature refinement, to obtain discriminative representations of
skeletons. Ambiguous samples are dynamically discovered and calibrated in the
feature space. Furthermore, FR Head could be imposed on different stages of
GCNs to build a multi-level refinement for stronger supervision. Extensive
experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
Our proposed models obtain competitive results from state-of-the-art methods
and can help to discriminate those ambiguous samples. Codes are available at
https://github.com/zhysora/FR-Head.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based
Human Action Recognition [10.403751563214113]
STD-CL is a framework to obtain discriminative and semantically distinct representations from the sequences.
STD-CL achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks.
arXiv Detail & Related papers (2023-12-23T02:54:41Z) - Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions [11.121652649243119]
Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation.
We propose a novel approach termed the detail reinforcement diffusion model(DRDM)
It leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference(SKR)
arXiv Detail & Related papers (2023-09-15T01:28:59Z) - Pose-Guided Graph Convolutional Networks for Skeleton-Based Action
Recognition [32.07659338674024]
Graph convolutional networks (GCNs) can model the human body skeletons as spatial and temporal graphs.
In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition.
The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability.
arXiv Detail & Related papers (2022-10-10T02:08:49Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Skeleton-based Action Recognition via Adaptive Cross-Form Learning [75.92422282666767]
Skeleton-based action recognition aims to project skeleton sequences to action categories, where sequences are derived from multiple forms of pre-detected points.
Existing methods tend to improve GCNs by leveraging multi-form skeletons due to their complementary cues.
We present Adaptive Cross-Form Learning (ACFL), which empowers well-designed GCNs to generate complementary representation from single-form skeletons.
arXiv Detail & Related papers (2022-06-30T07:40:03Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Skeleton Focused Human Activity Recognition in RGB Video [11.521107108725188]
We propose a multimodal feature fusion model that utilizes both skeleton and RGB modalities to infer human activity.
The model could be either individually or uniformly trained by the back-propagation algorithm in an end-to-end manner.
arXiv Detail & Related papers (2020-04-29T06:40:42Z) - Adversarial Feature Hallucination Networks for Few-Shot Learning [84.31660118264514]
Adversarial Feature Hallucination Networks (AFHN) is based on conditional Wasserstein Generative Adversarial networks (cWGAN)
Two novel regularizers are incorporated into AFHN to encourage discriminability and diversity of the synthesized features.
arXiv Detail & Related papers (2020-03-30T02:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.