Joint Adversarial and Collaborative Learning for Self-Supervised Action
Recognition
- URL: http://arxiv.org/abs/2307.07791v1
- Date: Sat, 15 Jul 2023 12:37:18 GMT
- Title: Joint Adversarial and Collaborative Learning for Self-Supervised Action
Recognition
- Authors: Tianyu Guo, Mengyuan Liu, Hong Liu, Wenhao Li, Jingwen Guo, Tao Wang,
Yidi Li
- Abstract summary: We present a joint Adversarial and Collaborative Learning framework, which combines Cross-Model Adversarial Learning (CMAL) and Cross-Stream Collaborative Learning (CSCL)
CMAL learns single-stream representation by cross-model adversarial loss to obtain more discriminative features.
To aggregate and interact with multi-stream information, CSCL is designed by generating similarity pseudo label of ensemble learning as supervision.
- Score: 25.25370509635083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considering the instance-level discriminative ability, contrastive learning
methods, including MoCo and SimCLR, have been adapted from the original image
representation learning task to solve the self-supervised skeleton-based action
recognition task. These methods usually use multiple data streams (i.e., joint,
motion, and bone) for ensemble learning, meanwhile, how to construct a
discriminative feature space within a single stream and effectively aggregate
the information from multiple streams remains an open problem. To this end, we
first apply a new contrastive learning method called BYOL to learn from
skeleton data and formulate SkeletonBYOL as a simple yet effective baseline for
self-supervised skeleton-based action recognition. Inspired by SkeletonBYOL, we
further present a joint Adversarial and Collaborative Learning (ACL) framework,
which combines Cross-Model Adversarial Learning (CMAL) and Cross-Stream
Collaborative Learning (CSCL). Specifically, CMAL learns single-stream
representation by cross-model adversarial loss to obtain more discriminative
features. To aggregate and interact with multi-stream information, CSCL is
designed by generating similarity pseudo label of ensemble learning as
supervision and guiding feature generation for individual streams. Exhaustive
experiments on three datasets verify the complementary properties between CMAL
and CSCL and also verify that our method can perform favorably against
state-of-the-art methods using various evaluation protocols. Our code and
models are publicly available at \url{https://github.com/Levigty/ACL}.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Anchor-aware Deep Metric Learning for Audio-visual Retrieval [11.675472891647255]
Metric learning aims at capturing the underlying data structure and enhancing the performance of tasks like audio-visual cross-modal retrieval (AV-CMR)
Recent works employ sampling methods to select impactful data points from the embedding space during training.
However, the model training fails to fully explore the space due to the scarcity of training data points.
We propose an innovative Anchor-aware Deep Metric Learning (AADML) method to address this challenge.
arXiv Detail & Related papers (2024-04-21T22:44:44Z) - Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition [22.067143671631303]
Self-supervised skeleton-based action recognition enjoys a rapid growth along with the development of contrastive learning.
We propose a Cross-Stream Contrastive Learning framework for skeleton-based action Representation learning (CSCLR)
Specifically, the proposed CSCLR not only utilizes intra-stream contrast pairs, but introduces inter-stream contrast pairs as hard samples to formulate a better representation learning.
arXiv Detail & Related papers (2023-05-03T10:31:35Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Learning Deep Representations via Contrastive Learning for Instance
Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL)
In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z) - COCOA: Cross Modality Contrastive Learning for Sensor Data [9.440900386313213]
COCOA (Cross mOdality COntrastive leArning) is a self-supervised model that employs a novel objective function to learn quality representations from multisensor data.
We show that COCOA achieves superior classification performance to all other approaches.
arXiv Detail & Related papers (2022-07-31T16:36:13Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - 3D Human Action Representation Learning via Cross-View Consistency
Pursuit [52.19199260960558]
We propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR)
CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner.
arXiv Detail & Related papers (2021-04-29T16:29:41Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.