Related papers: SignLLM: Sign Languages Production Large Language Models

SignLLM: Sign Languages Production Large Language Models

URL: http://arxiv.org/abs/2405.10718v1
Date: Fri, 17 May 2024 12:01:43 GMT
Title: SignLLM: Sign Languages Production Large Language Models
Authors: Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen,
Abstract summary: We introduce the first comprehensive multilingual sign language dataset named Prompt2Sign. Our dataset transforms a vast array of videos into a streamlined, model-friendly format. We propose SignLLM, the first multilingual Sign Language Production model.
Score: 33.438444361552854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.

Related papers

Using Sign Language Production as Data Augmentation to enhance Sign Language Translation [31.770455887142095]
Sign language datasets are often orders of magnitude smaller than their spoken language counterparts.<n>We propose leveraging recent advancements in Sign Language Production to augment existing sign language datasets.<n>Our results demonstrate that the proposed methods can effectively augment existing datasets and enhance the performance of Sign Language Translation models by up to 19%.
arXiv Detail & Related papers (2025-06-11T11:56:51Z)
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation [6.688680877428467]
We propose a novel gloss-free Multimodal Sign Language Translation framework. We generate detailed textual descriptions of sign language components using multimodal large language models. Our approach achieves state-of-the-art performance on benchmark datasets PHOENIX14T and CSL-Daily.
arXiv Detail & Related papers (2024-11-25T09:01:41Z)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language. Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z)
A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision [74.972172804514]
We introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text. New dataset annotations provide continuous sign-level annotations for six hours of test videos, and will be made publicly available. Our model significantly outperforms the previous state of the art on both tasks.
arXiv Detail & Related papers (2024-05-16T17:19:06Z)
Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation [30.008980708977095]
We introduce Sign2GPT, a novel framework for sign language translation. We propose a novel pretraining strategy that directs our encoder to learn sign representations from automatically extracted pseudo-glosses. We evaluate our approach on two public benchmark sign language translation datasets.
arXiv Detail & Related papers (2024-05-07T10:00:38Z)
LLMs are Good Sign Language Translators [19.259163728870696]
Sign Language Translation is a challenging task that aims to translate sign videos into spoken language. We propose a novel SignLLM framework to transform sign videos into a language-like representation. We achieve state-of-the-art gloss-free results on two widely-used SLT benchmarks.
arXiv Detail & Related papers (2024-04-01T05:07:13Z)
SignDiff: Diffusion Models for American Sign Language Production [23.82668888574089]
We propose a dual-condition diffusion pre-training model named SignDiff that can generate human sign language speakers from a skeleton pose. We also propose a new method for American Sign Language Production (ASLP), which can generate ASL skeletal pose videos from text input.
arXiv Detail & Related papers (2023-08-30T15:14:56Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks. Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training. We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z)
Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation [20.125265661134964]
Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos. In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation.
arXiv Detail & Related papers (2022-04-12T17:09:44Z)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts. Data is thus a bottleneck for training effective sign language translation models. This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z)
SLM: Learning a Discourse Language Representation with Sentence Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation. We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.