Semi-Supervised Spoken Language Glossification
- URL: http://arxiv.org/abs/2406.08173v1
- Date: Wed, 12 Jun 2024 13:05:27 GMT
- Title: Semi-Supervised Spoken Language Glossification
- Authors: Huijie Yao, Wengang Zhou, Hao Zhou, Houqiang Li,
- Abstract summary: Spoken language glossification (SLG) aims to translate the spoken language text into the sign language gloss.
We present a framework named $S$emi-$S$upervised $S$poken $L$anguage $G$lossification ($S3$LG) for SLG.
- Score: 101.31035869691462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language glossification (SLG) aims to translate the spoken language text into the sign language gloss, i.e., a written record of sign language. In this work, we present a framework named $S$emi-$S$upervised $S$poken $L$anguage $G$lossification ($S^3$LG) for SLG. To tackle the bottleneck of limited parallel data in SLG, our $S^3$LG incorporates large-scale monolingual spoken language text into SLG training. The proposed framework follows the self-training structure that iteratively annotates and learns from pseudo labels. Considering the lexical similarity and syntactic difference between sign language and spoken language, our $S^3$LG adopts both the rule-based heuristic and model-based approach for auto-annotation. During training, we randomly mix these complementary synthetic datasets and mark their differences with a special token. As the synthetic data may be less quality, the $S^3$LG further leverages consistency regularization to reduce the negative impact of noise in the synthetic data. Extensive experiments are conducted on public benchmarks to demonstrate the effectiveness of the $S^3$LG. Our code is available at \url{https://github.com/yaohj11/S3LG}.
Related papers
- A First Context-Free Grammar Applied to Nawatl Corpora Augmentation [0.21498988090998952]
We introduce a context-free grammar (CFG) for the Nawatl language.<n>Nawatl is an Amerindian language with few digital resources.<n>We show that a grammar enables us significantly to expand a corpus in Nawatl.
arXiv Detail & Related papers (2025-10-06T15:46:54Z) - LLaVA-SLT: Visual Language Tuning for Sign Language Translation [42.20090162339927]
Recent advancements in Sign Language Translation (SLT) have shown promise, yet they often largely lag behind gloss-based approaches in terms of accuracy.
We introduce LLaVA-SLT, a pioneering Large Multimodal Model (LMM) framework designed to leverage the power of Large Language Models (LLMs) through effectively learned visual language embeddings.
Our comprehensive experiments demonstrate that LLaVA-SLT outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-12-21T08:01:08Z) - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens [138.36729703589512]
We show that $n$-gram language models are still relevant in this era of neural large language models (LLMs)
This was done by modernizing $n$-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens.
Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $infty$-gram LM with backoff.
arXiv Detail & Related papers (2024-01-30T19:03:49Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Scaling up sign spotting through sign language dictionaries [99.50956498009094]
The focus of this work is $textitsign spotting$ - given a video of an isolated sign, our task is to identify $textitwhether$ and $textitwhere$ it has been signed in a continuous, co-articulated sign language video.
We train a model using multiple types of available supervision by: (1) $textitwatching$ existing footage which is sparsely labelled using mouthing cues; (2) $textitreading$ associated subtitles which provide additional translations of the signed content.
We validate the effectiveness of our approach on low
arXiv Detail & Related papers (2022-05-09T10:00:03Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - A Token-level Contrastive Framework for Sign Language Translation [9.185037439012952]
Sign Language Translation is a promising technology to bridge the communication gap between the deaf and the hearing people.
We propose ConSLT, a novel token-level.
textbfContrastive learning framework for textbfSign textbfLanguage.
textbfTranslation.
arXiv Detail & Related papers (2022-04-11T07:33:26Z) - Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models [1.827510863075184]
Novel benchmarks for multilingual natural language understanding (NLU) include monolingual sentences in several languages, annotated with intents and slots.
Existing benchmarks lack of code-switched utterances, which are difficult to gather and label due to complexity in the grammatical structure.
Our work adopts recognized methods to generate plausible and naturally-sounding code-switched utterances and uses them to create a synthetic code-switched test set.
arXiv Detail & Related papers (2021-09-29T11:15:00Z) - SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language
Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models.
We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers.
We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z) - Data Augmentation for Sign Language Gloss Translation [115.13684506803529]
Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-totext translation.
We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem.
By pre-training on the thus obtained synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.
arXiv Detail & Related papers (2021-05-16T16:37:36Z) - Synchronous Bidirectional Learning for Multilingual Lip Reading [99.14744013265594]
Lip movements of all languages share similar patterns due to the common structures of human organs.
Phonemes are more closely related with the lip movements than the alphabet letters.
A novel SBL block is proposed to learn the rules for each language in a fill-in-the-blank way.
arXiv Detail & Related papers (2020-05-08T04:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.