Unsupervised Spatial-Temporal Feature Enrichment and Fidelity
Preservation Network for Skeleton based Action Recognition
- URL: http://arxiv.org/abs/2401.14034v1
- Date: Thu, 25 Jan 2024 09:24:07 GMT
- Title: Unsupervised Spatial-Temporal Feature Enrichment and Fidelity
Preservation Network for Skeleton based Action Recognition
- Authors: Chuankun Li, Shuai Li, Yanbo Gao, Ping Chen, Jian Li, Wanqing Li
- Abstract summary: Unsupervised skeleton based action recognition has achieved remarkable progress recently.
Existing unsupervised learning methods suffer from severe overfitting problem.
This paper presents an Unsupervised spatial-temporal Feature Enrichment and Fidelity Preservation framework to generate rich distributed features.
- Score: 20.07820929037547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised skeleton based action recognition has achieved remarkable
progress recently. Existing unsupervised learning methods suffer from severe
overfitting problem, and thus small networks are used, significantly reducing
the representation capability. To address this problem, the overfitting
mechanism behind the unsupervised learning for skeleton based action
recognition is first investigated. It is observed that the skeleton is already
a relatively high-level and low-dimension feature, but not in the same manifold
as the features for action recognition. Simply applying the existing
unsupervised learning method may tend to produce features that discriminate the
different samples instead of action classes, resulting in the overfitting
problem. To solve this problem, this paper presents an Unsupervised
spatial-temporal Feature Enrichment and Fidelity Preservation framework
(U-FEFP) to generate rich distributed features that contain all the information
of the skeleton sequence. A spatial-temporal feature transformation subnetwork
is developed using spatial-temporal graph convolutional network and graph
convolutional gate recurrent unit network as the basic feature extraction
network. The unsupervised Bootstrap Your Own Latent based learning is used to
generate rich distributed features and the unsupervised pretext task based
learning is used to preserve the information of the skeleton sequence. The two
unsupervised learning ways are collaborated as U-FEFP to produce robust and
discriminative representations. Experimental results on three widely used
benchmarks, namely NTU-RGB+D-60, NTU-RGB+D-120 and PKU-MMD dataset, demonstrate
that the proposed U-FEFP achieves the best performance compared with the
state-of-the-art unsupervised learning methods. t-SNE illustrations further
validate that U-FEFP can learn more discriminative features for unsupervised
skeleton based action recognition.
Related papers
- Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition [13.593511876719367]
We propose a novel skeleton-based idempotent generative model (IGM) for unsupervised representation learning.
Our experiments on benchmark datasets, NTU RGB+D and PKUMMD, demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-10-27T06:29:04Z) - Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed.
Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Neuro-mimetic Task-free Unsupervised Online Learning with Continual
Self-Organizing Maps [56.827895559823126]
Self-organizing map (SOM) is a neural model often used in clustering and dimensionality reduction.
We propose a generalization of the SOM, the continual SOM, which is capable of online unsupervised learning under a low memory budget.
Our results, on benchmarks including MNIST, Kuzushiji-MNIST, and Fashion-MNIST, show almost a two times increase in accuracy.
arXiv Detail & Related papers (2024-02-19T19:11:22Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based
Gesture Recognition [73.64451471862613]
We propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition.
Joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand.
Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
arXiv Detail & Related papers (2021-06-25T02:15:53Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z) - Improving Skeleton-based Action Recognitionwith Robust Spatial and
Temporal Features [6.548580592686076]
We propose a novel mechanism to learn more robust discriminative features in space and time.
We show thataction recognition accuracy can be improved when these robust featuresare learned and used.
arXiv Detail & Related papers (2020-08-01T19:29:53Z) - SEKD: Self-Evolving Keypoint Detection and Description [42.114065439674036]
We propose a self-supervised framework to learn an advanced local feature model from unlabeled natural images.
We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks.
We will release our code along with the trained model publicly.
arXiv Detail & Related papers (2020-06-09T06:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.