Transcribing Natural Languages for The Deaf via Neural Editing Programs
        - URL: http://arxiv.org/abs/2112.09600v1
- Date: Fri, 17 Dec 2021 16:21:49 GMT
- Title: Transcribing Natural Languages for The Deaf via Neural Editing Programs
- Authors: Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars
  Petersson, Hongdong Li
- Abstract summary: We study the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.
Previous sequence-to-sequence language models often fail to capture the rich connections between the two distinct languages, leading to unsatisfactory transcriptions.
We observe that despite different grammars, glosses effectively simplify sentences for the ease of deaf communication, while sharing a large portion of vocabulary with sentences.
- Score: 84.0592111546958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   This work studies the task of glossification, of which the aim is to em
transcribe natural spoken language sentences for the Deaf (hard-of-hearing)
community to ordered sign language glosses. Previous sequence-to-sequence
language models trained with paired sentence-gloss data often fail to capture
the rich connections between the two distinct languages, leading to
unsatisfactory transcriptions. We observe that despite different grammars,
glosses effectively simplify sentences for the ease of deaf communication,
while sharing a large portion of vocabulary with sentences. This has motivated
us to implement glossification by executing a collection of editing actions,
e.g. word addition, deletion, and copying, called editing programs, on their
natural spoken language counterparts. Specifically, we design a new neural
agent that learns to synthesize and execute editing programs, conditioned on
sentence contexts and partial editing results. The agent is trained to imitate
minimal editing programs, while exploring more widely the program space via
policy gradients to optimize sequence-wise transcription quality. Results show
that our approach outperforms previous glossification models by a large margin.
 
      
        Related papers
        - ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in   Speech Language Models [70.56468982313834]
 We propose ProsodyLM, which introduces a simple tokenization scheme amenable to learning prosody.<n>We find that ProsodyLM can learn surprisingly diverse emerging prosody processing capabilities through pre-training alone.
 arXiv  Detail & Related papers  (2025-07-27T00:59:01Z)
- Hierarchical Feature Alignment for Gloss-Free Sign Language Translation [29.544715933336715]
 Sign Language Translation attempts to convert sign language videos into spoken sentences.<n>Existing methods struggle with disparity between visual and textual representations during end-to-end learning.<n>We introduce a novel hierarchical pre-training strategy inspired by the structure of sign language, incorporating pseudo-glosses and contrastive video-language alignment.
 arXiv  Detail & Related papers  (2025-07-09T10:45:50Z)
- Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign   Language Translation [48.20483623444857]
 Sign Language Translation aims to map sign language videos to spoken language text.<n>A common approach relies on gloss annotations as an intermediate representation.<n>We propose a gloss-free pseudo gloss generation framework that eliminates the need for human-annotated glosses.
 arXiv  Detail & Related papers  (2025-05-21T12:19:55Z)
- Gloss-free Sign Language Translation: Improving from Visual-Language
  Pretraining [56.26550923909137]
 Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
 arXiv  Detail & Related papers  (2023-07-27T10:59:18Z)
- How Generative Spoken Language Modeling Encodes Noisy Speech:
  Investigation from Phonetics to Syntactics [33.070158866023]
 generative spoken language modeling (GSLM) involves using learned symbols derived from data rather than phonemes for speech analysis and synthesis.
This paper presents the findings of GSLM's encoding and decoding effectiveness at the spoken-language and speech levels.
 arXiv  Detail & Related papers  (2023-06-01T14:07:19Z)
- Towards Automatic Speech to Sign Language Generation [35.22004819666906]
 We propose a multi-language transformer network trained to generate signer's poses from speech segments.
Our model learns to generate continuous sign pose sequences in an end-to-end manner.
 arXiv  Detail & Related papers  (2021-06-24T06:44:19Z)
- Correcting Automated and Manual Speech Transcription Errors using Warped
  Language Models [2.8614709576106874]
 We propose a novel approach that takes advantage of the robustness of warped language models to transcription noise for correcting transcriptions of spoken language.
We show that our proposed approach is able to achieve up to 10% reduction in word error rates of both automatic and manual transcriptions of spoken language.
 arXiv  Detail & Related papers  (2021-03-26T16:43:23Z)
- Context-Aware Prosody Correction for Text-Based Speech Editing [28.459695630420832]
 A major drawback of current systems is that edited recordings often sound unnatural because of prosody mismatches around edited regions.
We propose a new context-aware method for more natural sounding text-based editing of speech.
 arXiv  Detail & Related papers  (2021-02-16T18:16:30Z)
- Verb Knowledge Injection for Multilingual Event Processing [50.27826310460763]
 We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers.
We first demonstrate that injecting verb knowledge leads to performance gains in English event extraction.
We then explore the utility of verb adapters for event extraction in other languages.
 arXiv  Detail & Related papers  (2020-12-31T03:24:34Z)
- SLM: Learning a Discourse Language Representation with Sentence
  Unshuffling [53.42814722621715]
 We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
 arXiv  Detail & Related papers  (2020-10-30T13:33:41Z)
- Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
 Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
 arXiv  Detail & Related papers  (2020-05-05T09:02:25Z)
- On the Importance of Word Order Information in Cross-lingual Sequence
  Labeling [80.65425412067464]
 Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
 arXiv  Detail & Related papers  (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.