MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2404.10210v4
- Date: Fri, 13 Dec 2024 03:11:30 GMT
- Title: MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition
- Authors: Naichuan Zheng, Hailun Xia, Zeyu Liang, Yuchen Du,
- Abstract summary: MK-SGN is proposed to leverage the energy efficiency of Spiking Neural Networks (SNNs) for skeleton-based action recognition.
By integrating the energy-saving properties of SNNs with the graph representation capabilities of GCNs, MK-SGN achieves significant reductions in energy consumption.
The proposed method achieves a remarkable reduction in energy consumption, exceeding 98% compared to conventional GCN-based approaches.
- Score: 0.8942525984879532
- License:
- Abstract: In recent years, multimodal Graph Convolutional Networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. The reliance on high-energy-consuming continuous floating-point operations inherent in GCN-based methods poses significant challenges for deployment in energy-constrained, battery-powered edge devices. To address these limitations, MK-SGN, a Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation, is proposed to leverage the energy efficiency of Spiking Neural Networks (SNNs) for skeleton-based action recognition for the first time. By integrating the energy-saving properties of SNNs with the graph representation capabilities of GCNs, MK-SGN achieves significant reductions in energy consumption while maintaining competitive recognition accuracy. Firstly, we formulate a Spiking Multimodal Fusion (SMF) module to effectively fuse multimodal skeleton data represented as spike-form features. Secondly, we propose the Self-Attention Spiking Graph Convolution (SA-SGC) module and the Spiking Temporal Convolution (STC) module, to capture spatial relationships and temporal dynamics of spike-form features. Finally, we propose an integrated knowledge distillation strategy to transfer information from the multimodal GCN to the SGN, incorporating both intermediate-layer distillation and soft-label distillation to enhance the performance of the SGN. MK-SGN exhibits substantial advantages, surpassing state-of-the-art GCN frameworks in energy efficiency and outperforming state-of-the-art SNN frameworks in recognition accuracy. The proposed method achieves a remarkable reduction in energy consumption, exceeding 98\% compared to conventional GCN-based approaches. This research establishes a robust baseline for developing high-performance, energy-efficient SNN-based models for skeleton-based action recognition
Related papers
- SNN-Driven Multimodal Human Action Recognition via Event Camera and Skeleton Data Fusion [0.7910116766220068]
We propose a novel Spiking Neural Network (SNN)-driven framework for multimodal human action recognition.
Our framework is centered on two key innovations: (1) a novel multimodal SNN architecture that employs distinct backbone networks for each modality, and (2) a pioneering SNN-based discretized information bottleneck mechanism.
arXiv Detail & Related papers (2025-02-19T02:50:51Z) - Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics [2.707548544084083]
Spiking Neural Networks (SNNs) struggle to model skeleton dynamics, leading to suboptimal solutions.
We propose Signal-SGN (Spiking Graph Convolutional Network), which utilizes the temporal dimension of skeleton sequences as the spike time steps.
Experiments across three large-scale datasets reveal Signal-SGN exceeding state-of-the-art SNN-based methods in accuracy and computational efficiency.
arXiv Detail & Related papers (2024-08-03T07:47:16Z) - Continuous Spiking Graph Neural Networks [43.28609498855841]
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs)
We introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation.
We provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes.
arXiv Detail & Related papers (2024-04-02T12:36:40Z) - Enhancing Energy Efficiency and Reliability in Autonomous Systems
Estimation using Neuromorphic Approach [0.0]
This study focuses on introducing an estimation framework based on spike coding theories and spiking neural networks (SNN)
We propose an SNN-based Kalman filter (KF), a fundamental and widely adopted optimal strategy for well-defined linear systems.
Based on the modified sliding innovation filter (MSIF) we present a robust strategy called SNN-MSIF.
arXiv Detail & Related papers (2023-07-16T06:47:54Z) - Evaluating Distribution System Reliability with Hyperstructures Graph
Convolutional Nets [74.51865676466056]
We show how graph convolutional networks and hyperstructures representation learning framework can be employed for accurate, reliable, and computationally efficient distribution grid planning.
Our numerical experiments show that the proposed Hyper-GCNNs approach yields substantial gains in computational efficiency.
arXiv Detail & Related papers (2022-11-14T01:29:09Z) - MGNNI: Multiscale Graph Neural Networks with Implicit Layers [53.75421430520501]
implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs.
We introduce and justify two weaknesses of implicit GNNs: the constrained expressiveness due to their limited effective range for capturing long-range dependencies, and their lack of ability to capture multiscale information on graphs at multiple resolutions.
We propose a multiscale graph neural network with implicit layers (MGNNI) which is able to model multiscale structures on graphs and has an expanded effective range for capturing long-range dependencies.
arXiv Detail & Related papers (2022-10-15T18:18:55Z) - DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action
Recognition [77.87404524458809]
We propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN)
It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling.
DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.
arXiv Detail & Related papers (2022-10-12T03:17:37Z) - Spiking Graph Convolutional Networks [19.36064180392385]
SpikingGCN is an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs.
We show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis.
arXiv Detail & Related papers (2022-05-05T16:44:36Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.