Leveraging Graph-based Cross-modal Information Fusion for Neural Sign
Language Translation
- URL: http://arxiv.org/abs/2211.00526v1
- Date: Tue, 1 Nov 2022 15:26:22 GMT
- Title: Leveraging Graph-based Cross-modal Information Fusion for Neural Sign
Language Translation
- Authors: Jiangbin Zheng, Siyuan Li, Cheng Tan, Chong Wu, Yidong Chen, Stan Z.
Li
- Abstract summary: Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand.
We propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph.
We are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models.
- Score: 46.825957917649795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign Language (SL), as the mother tongue of the deaf community, is a special
visual language that most hearing people cannot understand. In recent years,
neural Sign Language Translation (SLT), as a possible way for bridging
communication gap between the deaf and the hearing people, has attracted
widespread academic attention. We found that the current mainstream end-to-end
neural SLT models, which tries to learning language knowledge in a weakly
supervised manner, could not mine enough semantic information under the
condition of low data resources. Therefore, we propose to introduce additional
word-level semantic knowledge of sign language linguistics to assist in
improving current end-to-end neural SLT models. Concretely, we propose a novel
neural SLT model with multi-modal feature fusion based on the dynamic graph, in
which the cross-modal information, i.e. text and video, is first assembled as a
dynamic graph according to their correlation, and then the graph is processed
by a multi-modal graph encoder to generate the multi-modal embeddings for
further usage in the subsequent neural translation models. To the best of our
knowledge, we are the first to introduce graph neural networks, for fusing
multi-modal information, into neural sign language translation models.
Moreover, we conducted experiments on a publicly available popular SLT dataset
RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our
method can improve the model.
Related papers
- A Lesion-aware Edge-based Graph Neural Network for Predicting Language Ability in Patients with Post-stroke Aphasia [12.129896943547912]
We propose a lesion-aware graph neural network (LEGNet) to predict language ability from resting-state fMRI (rs-fMRI) connectivity in patients with post-stroke aphasia.
Our model integrates three components: an edge-based learning module that encodes functional connectivity between brain regions, a lesion encoding module, and a subgraph learning module.
arXiv Detail & Related papers (2024-09-03T21:28:48Z) - Neural Language of Thought Models [18.930227757853313]
We introduce the Neural Language of Thought Model (NLoTM), a novel approach for unsupervised learning of LoTH-inspired representation and generation.
NLoTM comprises two key components: (1) the Semantic Vector-Quantized Variational Autoencoder, which learns hierarchical, composable discrete representations aligned with objects and their properties, and (2) the Autoregressive LoT Prior, an autoregressive transformer that learns to generate semantic concept tokens compositionally.
We evaluate NLoTM on several 2D and 3D image datasets, demonstrating superior performance in downstream tasks, out-of-distribution generalization, and image generation
arXiv Detail & Related papers (2024-02-02T08:13:18Z) - Neural Sign Actors: A diffusion model for 3D sign language production from text [51.81647203840081]
Sign Languages (SL) serve as the primary mode of communication for the Deaf and Hard of Hearing communities.
This work makes an important step towards realistic neural sign avatars, bridging the communication gap between Deaf and hearing communities.
arXiv Detail & Related papers (2023-12-05T12:04:34Z) - Language Generation from Brain Recordings [68.97414452707103]
We propose a generative language BCI that utilizes the capacity of a large language model and a semantic brain decoder.
The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli.
Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
arXiv Detail & Related papers (2023-11-16T13:37:21Z) - Mitigating Data Scarcity for Large Language Models [7.259279261659759]
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm.
Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research.
In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques.
arXiv Detail & Related papers (2023-02-03T15:17:53Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - On the Effectiveness of Neural Text Generation based Data Augmentation
for Recognition of Morphologically Rich Speech [0.0]
We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation.
We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
arXiv Detail & Related papers (2020-06-09T09:01:04Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.