Related papers: MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

URL: http://arxiv.org/abs/2404.10210v3
Date: Fri, 18 Oct 2024 05:17:29 GMT
Title: MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition
Authors: Naichuan Zheng, Hailun Xia, Zeyu Liang, Yuanyuan Chai,
Abstract summary: We propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN) to address this issue. By merging the energy efficiency of Spiking Neural Network (SNN) with the graph representation capability of GCN, the proposed MK-SGN reduces energy consumption while maintaining recognition accuracy.
Score: 0.6442618560991484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. We propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN) to address this issue. By merging the energy efficiency of Spiking Neural Network (SNN) with the graph representation capability of GCN, the proposed MK-SGN reduces energy consumption while maintaining recognition accuracy. Firstly, we convert Graph Convolutional Networks (GCN) into Spiking Graph Convolutional Networks (SGN) establishing a new benchmark and paving the way for future research exploration. During this process, we introduce a spiking attention mechanism and design a Spiking-Spatio Graph Convolution module with a Spatial Global Spiking Attention mechanism (SA-SGC), enhancing feature learning capability. Secondly, we propose a Spiking Multimodal Fusion module (SMF), leveraging mutual information to process multimodal data more efficiently. Lastly, we delve into knowledge distillation methods from multimodal GCN to SGN and propose a novel, integrated method that simultaneously focuses on both intermediate layer distillation and soft label distillation to improve the performance of SGN. MK-SGN outperforms the state-of-the-art GCN-like frameworks on three challenging datasets for skeleton-based action recognition in reducing energy consumption. It also outperforms the state-of-the-art SNN frameworks in accuracy. Specifically, our method reduces energy consumption by more than 98% compared to typical GCN-based methods, while maintaining competitive accuracy on the NTU-RGB+D 60 cross-subject split using 4-time steps.

Related papers

ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions [64.17845687013434]
We propose ReDiSC, a structured diffusion model for structured node classification.<n>We show that ReDiSC achieves superior or highly competitive performance compared to state-of-the-art GNN, label propagation, and diffusion-based baselines.<n> Notably, ReDiSC scales effectively to large-scale datasets on which previous structured diffusion methods fail due to computational constraints.
arXiv Detail & Related papers (2025-07-19T04:46:53Z)
Channel Fingerprint Construction for Massive MIMO: A Deep Conditional Generative Approach [65.47969413708344]
We introduce the concept of CF twins and design a conditional generative diffusion model (CGDM)<n>We employ a variational inference technique to derive the evidence lower bound (ELBO) for the log-marginal distribution of the observed fine-grained CF conditioned on the coarse-grained CF.<n>We show that the proposed approach exhibits significant improvement in reconstruction performance compared to the baselines.
arXiv Detail & Related papers (2025-05-12T01:36:06Z)
Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks [57.17129753411926]
Spiking neural networks (SNNs) are emerging as a promising alternative to traditional artificial neural networks (ANNs) We propose SpikeSR, which achieves state-of-the-art performance across various remote sensing benchmarks such as AID, DOTA, and DIOR.
arXiv Detail & Related papers (2025-03-06T09:06:06Z)
SNN-Driven Multimodal Human Action Recognition via Event Camera and Skeleton Data Fusion [0.7910116766220068]
We propose a novel Spiking Neural Network (SNN)-driven framework for multimodal human action recognition. Our framework is centered on two key innovations: (1) a novel multimodal SNN architecture that employs distinct backbone networks for each modality, and (2) a pioneering SNN-based discretized information bottleneck mechanism.
arXiv Detail & Related papers (2025-02-19T02:50:51Z)
Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics [2.9578022754506605]
In skeletal-based action recognition, Graph Convolutional Networks (GCNs) face limitations due to their complexity and high energy consumption. We propose a Signal-SGN(Spiking Graph Convolutional Network), which leverages the temporal dimension of skeletal sequences as the spiking timestep. Our experiments show that the proposed models not only surpass existing SNN-based methods in accuracy but also reduce computational storage costs during training.
arXiv Detail & Related papers (2024-08-03T07:47:16Z)
Continuous Spiking Graph Neural Networks [43.28609498855841]
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) We introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation. We provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes.
arXiv Detail & Related papers (2024-04-02T12:36:40Z)
Enhancing Energy Efficiency and Reliability in Autonomous Systems Estimation using Neuromorphic Approach [0.0]
This study focuses on introducing an estimation framework based on spike coding theories and spiking neural networks (SNN) We propose an SNN-based Kalman filter (KF), a fundamental and widely adopted optimal strategy for well-defined linear systems. Based on the modified sliding innovation filter (MSIF) we present a robust strategy called SNN-MSIF.
arXiv Detail & Related papers (2023-07-16T06:47:54Z)
Spiking Variational Graph Auto-Encoders for Efficient Graph Representation Learning [10.65760757021534]
We propose an SNN-based deep generative method, namely the Spiking Variational Graph Auto-Encoders (S-VGAE) for efficient graph representation learning. We conduct link prediction experiments on multiple benchmark graph datasets, and the results demonstrate that our model consumes significantly lower energy with the performances superior or comparable to other ANN- and SNN-based methods for graph representation learning.
arXiv Detail & Related papers (2022-10-24T12:54:41Z)
MGNNI: Multiscale Graph Neural Networks with Implicit Layers [53.75421430520501]
implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs. We introduce and justify two weaknesses of implicit GNNs: the constrained expressiveness due to their limited effective range for capturing long-range dependencies, and their lack of ability to capture multiscale information on graphs at multiple resolutions. We propose a multiscale graph neural network with implicit layers (MGNNI) which is able to model multiscale structures on graphs and has an expanded effective range for capturing long-range dependencies.
arXiv Detail & Related papers (2022-10-15T18:18:55Z)
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition [77.87404524458809]
We propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN) It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling. DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.
arXiv Detail & Related papers (2022-10-12T03:17:37Z)
Spiking Graph Convolutional Networks [19.36064180392385]
SpikingGCN is an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs. We show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis.
arXiv Detail & Related papers (2022-05-05T16:44:36Z)
SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition. We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors. Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z)
Multi-scale Graph Convolutional Networks with Self-Attention [2.66512000865131]
Graph convolutional networks (GCNs) have achieved remarkable learning ability for dealing with various graph structural data. Over-smoothing phenomenon as a crucial issue of GCNs remains to be solved and investigated. We propose two novel multi-scale GCN frameworks by incorporating self-attention mechanism and multi-scale information into the design of GCNs.
arXiv Detail & Related papers (2021-12-04T04:41:24Z)
Multi-Scale Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition. MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z)
Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks [0.5801044612920815]
Fusion-GCN is an approach for multimodal action recognition using Graph Convolutional Networks (GCNs) We integrate various sensor data modalities into a graph that is trained using a GCN model for multi-modal action recognition.
arXiv Detail & Related papers (2021-09-27T10:52:33Z)
Towards Efficient Graph Convolutional Networks for Point Cloud Handling [181.59146413326056]
We aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds. A series of experiments show that optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed.
arXiv Detail & Related papers (2021-04-12T17:59:16Z)
Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition. Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z)
On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph. Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z)
Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition [97.14064057840089]
Graphal networks (GCNs) have been very successful in modeling non-Euclidean data structures. Most GCN-based action recognition methods use deep feed-forward networks with high computational complexity to process all skeletons in an action. We propose a temporal attention module (TAM) for increasing the efficiency in skeleton-based action recognition.
arXiv Detail & Related papers (2020-10-23T08:01:55Z)
Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation [56.73834525802723]
Lightweight Dynamic Graph Convolutional Networks (LDGCNs) are proposed. LDGCNs capture richer non-local interactions by synthesizing higher order information from the input graphs. We develop two novel parameter saving strategies based on the group graph convolutions and weight tied convolutions to reduce memory usage and model complexity.
arXiv Detail & Related papers (2020-10-09T06:03:46Z)
Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters. Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches. Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.