Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural
Network
- URL: http://arxiv.org/abs/2111.07258v1
- Date: Sun, 14 Nov 2021 07:02:28 GMT
- Title: Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural
Network
- Authors: Jichao Kan, Kun Hu, Markus Hagenbuchner, Ah Chung Tsoi, Mohammed
Bennamounm, Zhiyong Wang
- Abstract summary: Sign language translation (SLT) generates text in a spoken language from visual content in a sign language.
In this paper, these unique characteristics of sign languages are formulated as hierarchical-temporal graph representations.
A novel deep learning architecture, namely hierarchical hierarchical-temporal graph neural network (HSTG-NN), is proposed.
- Score: 6.623802929157273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sign language translation (SLT), which generates text in a spoken language
from visual content in a sign language, is important to assist the
hard-of-hearing community for their communications. Inspired by neural machine
translation (NMT), most existing SLT studies adopted a general sequence to
sequence learning strategy. However, SLT is significantly different from
general NMT tasks since sign languages convey messages through multiple
visual-manual aspects. Therefore, in this paper, these unique characteristics
of sign languages are formulated as hierarchical spatio-temporal graph
representations, including high-level and fine-level graphs of which a vertex
characterizes a specified body part and an edge represents their interactions.
Particularly, high-level graphs represent the patterns in the regions such as
hands and face, and fine-level graphs consider the joints of hands and
landmarks of facial regions. To learn these graph patterns, a novel deep
learning architecture, namely hierarchical spatio-temporal graph neural network
(HST-GNN), is proposed. Graph convolutions and graph self-attentions with
neighborhood context are proposed to characterize both the local and the global
graph properties. Experimental results on benchmark datasets demonstrated the
effectiveness of the proposed method.
Related papers
- HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment [41.75926736949724]
We propose a novel strategy called HIerarchical GrapH Tokenization (HIGHT) to improve the graph perception of large language models (LLMs)
HIGHT employs a hierarchical graph tokenizer that extracts and encodes the hierarchy of node, motif, and graph levels of informative tokens to improve the graph perception of LLMs.
Experiments on 7 molecule-centric benchmarks confirm the effectiveness of HIGHT in reducing hallucination by 40%, as well as significant improvements in various molecule-language downstream tasks.
arXiv Detail & Related papers (2024-06-20T06:37:35Z) - Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - Message Detouring: A Simple Yet Effective Cycle Representation for
Expressive Graph Learning [4.085624738017079]
We introduce the concept of textitmessage detouring to hierarchically characterize cycle representation throughout the entire graph.
Message detouring can significantly outperform current counterpart approaches on various benchmark datasets.
arXiv Detail & Related papers (2024-02-12T22:06:37Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - Hyperbolic Graph Neural Networks: A Review of Methods and Applications [55.5502008501764]
Graph neural networks generalize conventional neural networks to graph-structured data.
The performance of Euclidean models in graph-related learning is still bounded and limited by the representation ability of Euclidean geometry.
Recently, hyperbolic space has gained increasing popularity in processing graph data with tree-like structure and power-law distribution.
arXiv Detail & Related papers (2022-02-28T15:08:48Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - GINet: Graph Interaction Network for Scene Parsing [58.394591509215005]
We propose a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss) to promote context reasoning over image regions.
The proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.
arXiv Detail & Related papers (2020-09-14T02:52:45Z) - Tensor Graph Convolutional Networks for Multi-relational and Robust
Learning [74.05478502080658]
This paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor.
The proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
arXiv Detail & Related papers (2020-03-15T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.