Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into
Sign Language Production
- URL: http://arxiv.org/abs/2112.05277v1
- Date: Mon, 6 Dec 2021 10:12:11 GMT
- Title: Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into
Sign Language Production
- Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden
- Abstract summary: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications.
In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges.
We propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton bias into the SLP model.
- Score: 37.679114155300084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches to Sign Language Production (SLP) have adopted spoken
language Neural Machine Translation (NMT) architectures, applied without
sign-specific modifications. In addition, these works represent sign language
as a sequence of skeleton pose vectors, projected to an abstract representation
with no inherent skeletal structure. In this paper, we represent sign language
sequences as a skeletal graph structure, with joints as nodes and both spatial
and temporal connections as edges. To operate on this graphical structure, we
propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer
that embeds a skeleton inductive bias into the SLP model. Retaining the
skeletal feature representation throughout, we directly apply a spatio-temporal
adjacency matrix into the self-attention formulation. This provides structure
and context to each skeletal joint that is not possible when using a
non-graphical abstract representation, enabling fluid and expressive sign
language production. We evaluate our Skeletal Graph Self-Attention architecture
on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset, achieving
state-of-the-art back translation performance with an 8% and 7% improvement
over competing methods for the dev and test sets.
Related papers
- LAC: Latent Action Composition for Skeleton-based Action Segmentation [21.797658771678066]
Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos.
Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions.
We propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation.
arXiv Detail & Related papers (2023-08-28T11:20:48Z) - SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence
Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL)
In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE.
Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z) - TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning
with Structure-Trajectory Prompted Reconstruction for Person
Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeleton data is an emerging topic with prominent advantages.
Existing methods usually design skeleton descriptors with raw body joints or perform skeleton sequence representation learning.
We propose a generic Transformer-based Skeleton Graph prototype contrastive learning (TranSG) approach with structure-trajectory prompted reconstruction.
arXiv Detail & Related papers (2023-03-13T02:27:45Z) - Graph Contrastive Learning for Skeleton-based Action Recognition [85.86820157810213]
We propose a graph contrastive learning framework for skeleton-based action recognition.
SkeletonGCL associates graph learning across sequences by enforcing graphs to be class-discriminative.
SkeletonGCL establishes a new training paradigm, and it can be seamlessly incorporated into current graph convolutional networks.
arXiv Detail & Related papers (2023-01-26T02:09:16Z) - Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation
Modeling for Unsupervised Person Re-Identification [63.903237777588316]
Person re-identification (re-ID) via 3D skeletons is an important emerging topic with many merits.
Existing solutions rarely explore valuable body-component relations in skeletal structure or motion.
This paper proposes a generic unsupervised Prototype Contrastive learning paradigm with Multi-level Graph Relation learning.
arXiv Detail & Related papers (2022-08-25T00:59:32Z) - SimMC: Simple Masked Contrastive Learning of Skeleton Representations
for Unsupervised Person Re-Identification [63.903237777588316]
We present a generic Simple Masked Contrastive learning (SimMC) framework to learn effective representations from unlabeled 3D skeletons for person re-ID.
Specifically, to fully exploit skeleton features within each skeleton sequence, we first devise a masked prototype contrastive learning (MPC) scheme.
Then, we propose the masked intra-sequence contrastive learning (MIC) to capture intra-sequence pattern consistency between subsequences.
arXiv Detail & Related papers (2022-04-21T00:19:38Z) - Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural
Network [6.623802929157273]
Sign language translation (SLT) generates text in a spoken language from visual content in a sign language.
In this paper, these unique characteristics of sign languages are formulated as hierarchical-temporal graph representations.
A novel deep learning architecture, namely hierarchical hierarchical-temporal graph neural network (HSTG-NN), is proposed.
arXiv Detail & Related papers (2021-11-14T07:02:28Z) - Skeleton-Contrastive 3D Action Representation Learning [35.06361753065124]
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets.
arXiv Detail & Related papers (2021-08-08T14:44:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.