Self-Supervised Human Activity Recognition by Augmenting Generative
Adversarial Networks
- URL: http://arxiv.org/abs/2008.11755v2
- Date: Mon, 28 Dec 2020 18:45:50 GMT
- Title: Self-Supervised Human Activity Recognition by Augmenting Generative
Adversarial Networks
- Authors: Mohammad Zaki Zadeh, Ashwin Ramesh Babu, Ashish Jaiswal, Fillia
Makedon
- Abstract summary: This article proposes a novel approach for augmenting generative adversarial network (GAN) with a self-supervised task.
In the proposed method, input video frames are randomly transformed by different spatial transformations.
discriminator is encouraged to predict the applied transformation by introducing an auxiliary loss.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article proposes a novel approach for augmenting generative adversarial
network (GAN) with a self-supervised task in order to improve its ability for
encoding video representations that are useful in downstream tasks such as
human activity recognition. In the proposed method, input video frames are
randomly transformed by different spatial transformations, such as rotation,
translation and shearing or temporal transformations such as shuffling temporal
order of frames. Then discriminator is encouraged to predict the applied
transformation by introducing an auxiliary loss. Subsequently, results prove
superiority of the proposed method over baseline methods for providing a useful
representation of videos used in human activity recognition performed on
datasets such as KTH, UCF101 and Ball-Drop. Ball-Drop dataset is a specifically
designed dataset for measuring executive functions in children through
physically and cognitively demanding tasks. Using features from proposed method
instead of baseline methods caused the top-1 classification accuracy to
increase by more then 4%. Moreover, ablation study was performed to investigate
the contribution of different transformations on downstream task.
Related papers
- Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Graph Convolution Based Efficient Re-Ranking for Visual Retrieval [29.804582207550478]
We present an efficient re-ranking method which refines initial retrieval results by updating features.
Specifically, we reformulate re-ranking based on Graph Convolution Networks (GCN) and propose a novel Graph Convolution based Re-ranking (GCR) for visual retrieval tasks via feature propagation.
In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras.
arXiv Detail & Related papers (2023-06-15T00:28:08Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive
Regularizer Learning [13.08779945306727]
Current state-of-the-art methods ignore category information of objects which is crucial for grasp pattern recognition.
This paper presents a novel dual-branch convolutional neural network (DcnnGrasp) to achieve joint learning of object category classification and grasp pattern recognition.
arXiv Detail & Related papers (2022-05-11T00:34:27Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs)
Our approach is based on learning a set of global and task-specific parameters.
We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z) - TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain
Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition.
We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space.
Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.