Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into
Sign Language Production
- URL: http://arxiv.org/abs/2112.05277v1
- Date: Mon, 6 Dec 2021 10:12:11 GMT
- Title: Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into
Sign Language Production
- Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden
- Abstract summary: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications.
In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges.
We propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton bias into the SLP model.
- Score: 37.679114155300084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches to Sign Language Production (SLP) have adopted spoken
language Neural Machine Translation (NMT) architectures, applied without
sign-specific modifications. In addition, these works represent sign language
as a sequence of skeleton pose vectors, projected to an abstract representation
with no inherent skeletal structure. In this paper, we represent sign language
sequences as a skeletal graph structure, with joints as nodes and both spatial
and temporal connections as edges. To operate on this graphical structure, we
propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer
that embeds a skeleton inductive bias into the SLP model. Retaining the
skeletal feature representation throughout, we directly apply a spatio-temporal
adjacency matrix into the self-attention formulation. This provides structure
and context to each skeletal joint that is not possible when using a
non-graphical abstract representation, enabling fluid and expressive sign
language production. We evaluate our Skeletal Graph Self-Attention architecture
on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset, achieving
state-of-the-art back translation performance with an 8% and 7% improvement
over competing methods for the dev and test sets.
Related papers
- Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification [60.939250172443586]
Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios.
Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning.
This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos)
MoCos exploits structure-specific and gait-related body relations as well as features of skeleton graphs to learn effective skeleton representations for person re-ID.
arXiv Detail & Related papers (2024-12-12T08:13:29Z) - LAC: Latent Action Composition for Skeleton-based Action Segmentation [21.797658771678066]
Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos.
Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions.
We propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation.
arXiv Detail & Related papers (2023-08-28T11:20:48Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning
with Structure-Trajectory Prompted Reconstruction for Person
Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeleton data is an emerging topic with prominent advantages.
Existing methods usually design skeleton descriptors with raw body joints or perform skeleton sequence representation learning.
We propose a generic Transformer-based Skeleton Graph prototype contrastive learning (TranSG) approach with structure-trajectory prompted reconstruction.
arXiv Detail & Related papers (2023-03-13T02:27:45Z) - Graph Contrastive Learning for Skeleton-based Action Recognition [85.86820157810213]
We propose a graph contrastive learning framework for skeleton-based action recognition.
SkeletonGCL associates graph learning across sequences by enforcing graphs to be class-discriminative.
SkeletonGCL establishes a new training paradigm, and it can be seamlessly incorporated into current graph convolutional networks.
arXiv Detail & Related papers (2023-01-26T02:09:16Z) - Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation
Modeling for Unsupervised Person Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeletons is an important emerging topic with many merits.
Existing solutions rarely explore valuable body-component relations in skeletal structure or motion.
This paper proposes a generic unsupervised Prototype Contrastive learning paradigm with Multi-level Graph Relation learning.
arXiv Detail & Related papers (2022-08-25T00:59:32Z) - Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural
Network [6.623802929157273]
Sign language translation (SLT) generates text in a spoken language from visual content in a sign language.
In this paper, these unique characteristics of sign languages are formulated as hierarchical-temporal graph representations.
A novel deep learning architecture, namely hierarchical hierarchical-temporal graph neural network (HSTG-NN), is proposed.
arXiv Detail & Related papers (2021-11-14T07:02:28Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.