Related papers: Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production

Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production

URL: http://arxiv.org/abs/2112.05277v1
Date: Mon, 6 Dec 2021 10:12:11 GMT
Title: Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production
Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden
Abstract summary: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges. We propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton bias into the SLP model.
Score: 37.679114155300084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges. To operate on this graphical structure, we propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton inductive bias into the SLP model. Retaining the skeletal feature representation throughout, we directly apply a spatio-temporal adjacency matrix into the self-attention formulation. This provides structure and context to each skeletal joint that is not possible when using a non-graphical abstract representation, enabling fluid and expressive sign language production. We evaluate our Skeletal Graph Self-Attention architecture on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset, achieving state-of-the-art back translation performance with an 8% and 7% improvement over competing methods for the dev and test sets.

Related papers

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification [60.939250172443586]
Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) MoCos exploits structure-specific and gait-related body relations as well as features of skeleton graphs to learn effective skeleton representations for person re-ID.
arXiv Detail & Related papers (2024-12-12T08:13:29Z)
LAC: Latent Action Composition for Skeleton-based Action Segmentation [21.797658771678066]
Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos. Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions. We propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation.
arXiv Detail & Related papers (2023-08-28T11:20:48Z)
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL) In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE. Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z)
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeleton data is an emerging topic with prominent advantages. Existing methods usually design skeleton descriptors with raw body joints or perform skeleton sequence representation learning. We propose a generic Transformer-based Skeleton Graph prototype contrastive learning (TranSG) approach with structure-trajectory prompted reconstruction.
arXiv Detail & Related papers (2023-03-13T02:27:45Z)
Graph Contrastive Learning for Skeleton-based Action Recognition [85.86820157810213]
We propose a graph contrastive learning framework for skeleton-based action recognition. SkeletonGCL associates graph learning across sequences by enforcing graphs to be class-discriminative. SkeletonGCL establishes a new training paradigm, and it can be seamlessly incorporated into current graph convolutional networks.
arXiv Detail & Related papers (2023-01-26T02:09:16Z)
Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation Modeling for Unsupervised Person Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeletons is an important emerging topic with many merits. Existing solutions rarely explore valuable body-component relations in skeletal structure or motion. This paper proposes a generic unsupervised Prototype Contrastive learning paradigm with Multi-level Graph Relation learning.
arXiv Detail & Related papers (2022-08-25T00:59:32Z)
SimMC: Simple Masked Contrastive Learning of Skeleton Representations for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID. Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme. Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z)
Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural Network [6.623802929157273]
Sign language translation (SLT) generates text in a spoken language from visual content in a sign language. In this paper, these unique characteristics of sign languages are formulated as hierarchical-temporal graph representations. A novel deep learning architecture, namely hierarchical hierarchical-temporal graph neural network (HSTG-NN), is proposed.
arXiv Detail & Related papers (2021-11-14T07:02:28Z)
Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.