Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production
- URL: http://arxiv.org/abs/2407.02854v1
- Date: Wed, 3 Jul 2024 07:12:36 GMT
- Title: Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production
- Authors: Eui Jun Hwang, Sukmin Cho, Huije Lee, Youngwoo Yoon, Jong C. Park,
- Abstract summary: Universal Gloss-level Representation (UniGloR) is a unified and self-supervised solution for both Sign Language Translation and Sign Language Production.
Our results demonstrate UniGloR's effectiveness in the translation and production tasks.
Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications.
- Score: 9.065171626657818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign language, essential for the deaf and hard-of-hearing, presents unique challenges in translation and production due to its multimodal nature and the inherent ambiguity in mapping sign language motion to spoken language words. Previous methods often rely on gloss annotations, requiring time-intensive labor and specialized expertise in sign language. Gloss-free methods have emerged to address these limitations, but they often depend on external sign language data or dictionaries, failing to completely eliminate the need for gloss annotations. There is a clear demand for a comprehensive approach that can supplant gloss annotations and be utilized for both Sign Language Translation (SLT) and Sign Language Production (SLP). We introduce Universal Gloss-level Representation (UniGloR), a unified and self-supervised solution for both SLT and SLP, trained on multiple datasets including PHOENIX14T, How2Sign, and NIASL2021. Our results demonstrate UniGloR's effectiveness in the translation and production tasks. We further report an encouraging result for the Sign Language Recognition (SLR) on previously unseen data. Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications in future research.
Related papers
- Scaling up Multimodal Pre-training for Sign Language Understanding [96.17753464544604]
Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
arXiv Detail & Related papers (2024-08-16T06:04:25Z) - Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation [30.008980708977095]
We introduce Sign2GPT, a novel framework for sign language translation.
We propose a novel pretraining strategy that directs our encoder to learn sign representations from automatically extracted pseudo-glosses.
We evaluate our approach on two public benchmark sign language translation datasets.
arXiv Detail & Related papers (2024-05-07T10:00:38Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Gloss Attention for Gloss-free Sign Language Translation [60.633146518820325]
We show how gloss annotations make sign language translation easier.
We then propose emphgloss attention, which enables the model to keep its attention within video segments that have the same semantics locally.
Experimental results on multiple large-scale sign language datasets show that our proposed GASLT model significantly outperforms existing methods.
arXiv Detail & Related papers (2023-07-14T14:07:55Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - Changing the Representation: Examining Language Representation for
Neural Sign Language Production [43.45785951443149]
We apply Natural Language Processing techniques to the first step of the Neural Sign Language Production pipeline.
We use language models such as BERT and Word2Vec to create better sentence level embeddings.
We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation.
arXiv Detail & Related papers (2022-09-16T12:45:29Z) - All You Need In Sign Language Production [50.3955314892191]
Sign language recognition and production need to cope with some critical challenges.
We present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language.
Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented.
arXiv Detail & Related papers (2022-01-05T13:45:09Z) - Including Signed Languages in Natural Language Processing [48.62744923724317]
Signed languages are the primary means of communication for many deaf and hard of hearing individuals.
This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact.
arXiv Detail & Related papers (2021-05-11T17:37:55Z) - Adversarial Training for Multi-Channel Sign Language Production [43.45785951443149]
We propose an Adversarial Multi-Channel approach to Sign Language Production.
We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator.
Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output.
arXiv Detail & Related papers (2020-08-27T23:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.