Related papers: Hypergraph Transformer for Skeleton-based Action Recognition

Hypergraph Transformer for Skeleton-based Action Recognition

URL: http://arxiv.org/abs/2211.09590v5
Date: Tue, 21 Mar 2023 17:34:34 GMT
Title: Hypergraph Transformer for Skeleton-based Action Recognition
Authors: Yuxuan Zhou, Zhi-Qi Cheng, Chao Li, Yanwen Fang, Yifeng Geng, Xuansong Xie, Margret Keuper
Abstract summary: Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections. Previous works successfully adopted Graph Convolutional networks (GCNs) to model joint co-occurrences. We propose a new self-attention (SA) mechanism on hypergraph, termed Hypergraph Self-Attention (HyperSA), to incorporate intrinsic higher-order relations into the model.
Score: 21.763844802116857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections. By defining a graph with joints as vertices and their natural connections as edges, previous works successfully adopted Graph Convolutional networks (GCNs) to model joint co-occurrences and achieved superior performance. More recently, a limitation of GCNs is identified, i.e., the topology is fixed after training. To relax such a restriction, Self-Attention (SA) mechanism has been adopted to make the topology of GCNs adaptive to the input, resulting in the state-of-the-art hybrid models. Concurrently, attempts with plain Transformers have also been made, but they still lag behind state-of-the-art GCN-based methods due to the lack of structural prior. Unlike hybrid models, we propose a more elegant solution to incorporate the bone connectivity into Transformer via a graph distance embedding. Our embedding retains the information of skeletal structure during training, whereas GCNs merely use it for initialization. More importantly, we reveal an underlying issue of graph models in general, i.e., pairwise aggregation essentially ignores the high-order kinematic dependencies between body joints. To fill this gap, we propose a new self-attention (SA) mechanism on hypergraph, termed Hypergraph Self-Attention (HyperSA), to incorporate intrinsic higher-order relations into the model. We name the resulting model Hyperformer, and it beats state-of-the-art graph models w.r.t. accuracy and efficiency on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets.

Related papers

Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition [3.700463358780727]
We propose an Autoregressive Adaptive HyperGraph Transformer (AutoregAd-HGformer) model for in-phase (autoregressive and discrete) and out-phase (adaptive) hypergraph generation. The vector quantized in-phase hypergraph equipped with powerful autoregressive learned priors produces a more robust and informative representation suitable for hyperedge formation. The hybrid (supervised and unsupervised) learning in AutoregAd-HGformer explores the action-dependent feature along spatial, temporal, and channel dimensions.
arXiv Detail & Related papers (2024-11-08T16:45:52Z)
Cell Graph Transformer for Nuclei Classification [78.47566396839628]
We develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes. Poorly features can lead to noisy self-attention scores and inferior convergence. We propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor.
arXiv Detail & Related papers (2024-02-20T12:01:30Z)
Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness [24.83836008577395]
Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition. They tend to optimize the adjacency matrix jointly with the model weights. This process causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map. We propose an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.
arXiv Detail & Related papers (2023-05-19T06:40:12Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Dynamic Hypergraph Convolutional Networks for Skeleton-Based Action Recognition [22.188135882864287]
We propose a novel dynamic hypergraph convolutional networks (DHGCN) for skeleton-based action recognition. DHGCN uses hypergraph to represent the skeleton structure to effectively exploit the motion information contained in human joints.
arXiv Detail & Related papers (2021-12-20T14:46:14Z)
A Deep Latent Space Model for Graph Representation Learning [10.914558012458425]
We propose a Deep Latent Space Model (DLSM) for directed graphs to incorporate the traditional latent variable based generative model into deep learning frameworks. Our proposed model consists of a graph convolutional network (GCN) encoder and a decoder, which are layer-wise connected by a hierarchical variational auto-encoder architecture. Experiments on real-world datasets show that the proposed model achieves the state-of-the-art performances on both link prediction and community detection tasks.
arXiv Detail & Related papers (2021-06-22T12:41:19Z)
Multi Scale Temporal Graph Networks For Skeleton-based Action Recognition [5.970574258839858]
Graph convolutional networks (GCNs) can effectively capture the features of related nodes and improve the performance of the model. Existing methods based on GCNs have two problems. First, the consistency of temporal and spatial features is ignored for extracting features node by node and frame by frame. We propose a novel model called Temporal Graph Networks (TGN) for action recognition.
arXiv Detail & Related papers (2020-12-05T08:08:25Z)
Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition. Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z)
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control [65.77164390203396]
We present a series of ablations on existing methods that show that morphological information encoded in the graph does not improve their performance. Motivated by the hypothesis that any benefits GNNs extract from the graph structure are outweighed by difficulties they create for message passing, we also propose Amorpheus.
arXiv Detail & Related papers (2020-10-05T08:37:11Z)
Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data. We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry. We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z)
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach [55.44107800525776]
Graph Convolutional Networks (GCNs) are state-of-the-art graph based representation learning models. In this paper, we revisit GCN based Collaborative Filtering (CF) based Recommender Systems (RS) We show that removing non-linearities would enhance recommendation performance, consistent with the theories in simple graph convolutional networks. We propose a residual network structure that is specifically designed for CF with user-item interaction modeling.
arXiv Detail & Related papers (2020-01-28T04:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.