Improvement in Sign Language Translation Using Text CTC Alignment
- URL: http://arxiv.org/abs/2412.09014v4
- Date: Tue, 24 Dec 2024 05:41:32 GMT
- Title: Improvement in Sign Language Translation Using Text CTC Alignment
- Authors: Sihan Tan, Taro Miyazaki, Nabeela Khan, Kazuhiro Nakadai,
- Abstract summary: We propose a novel method combining joint CTC/Attention and transfer learning.
Joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding.
Our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline.
- Score: 3.513798449090645
- License:
- Abstract: Current sign language translation (SLT) approaches often rely on gloss-based supervision with Connectionist Temporal Classification (CTC), limiting their ability to handle non-monotonic alignments between sign language video and spoken text. In this work, we propose a novel method combining joint CTC/Attention and transfer learning. The joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding, effectively managing both monotonic and non-monotonic alignments. Meanwhile, transfer learning helps bridge the modality gap between vision and language in SLT. Experimental results on two widely adopted benchmarks, RWTH-PHOENIX-Weather 2014 T and CSL-Daily, show that our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline. Additionally, this work opens a new door for future research into gloss-free SLT using text-based CTC alignment.
Related papers
- Bridging the Gaps of Both Modality and Language: Synchronous Bilingual
CTC for Speech Translation and Speech Recognition [46.41096278421193]
BiL-CTC+ bridges the gap between audio and text as well as between source and target languages.
Our method also yields significant improvements in speech recognition performance.
arXiv Detail & Related papers (2023-09-21T16:28:42Z) - Language-Oriented Communication with Semantic Coding and Knowledge
Distillation for Text-to-Image Generation [53.97155730116369]
We put forward a novel framework of language-oriented semantic communication (LSC)
In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency.
We introduce three innovative algorithms: 1) semantic source coding (SSC), which compresses a text prompt into its key head words capturing the prompt's syntactic essence; 2) semantic channel coding ( SCC), that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD), that produces listener-customized prompts via in-context learning the listener's
arXiv Detail & Related papers (2023-09-20T08:19:05Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes.
We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z) - BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
Pre-trained Masked Language Model [40.16332045057132]
BERT-CTC is a novel formulation of end-to-end speech recognition.
It incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding.
BERT-CTC improves over conventional approaches across variations in speaking styles and languages.
arXiv Detail & Related papers (2022-10-29T18:19:44Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z) - CTC-synchronous Training for Monotonic Attention Model [43.0382262234792]
backward probabilities cannot be leveraged in the alignment process during training due to left-to-right dependency in the decoder.
We propose CTC-synchronous training ( CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments.
The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments.
arXiv Detail & Related papers (2020-05-10T16:48:23Z) - GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text
Recognition [27.38969404322089]
We propose the guided training of CTC model, where CTC model learns a better alignment and feature representations from a more powerful attentional guidance.
With the benefit of guided training, CTC model achieves robust and accurate prediction for both regular and irregular scene text.
To further leverage the potential of CTC decoder, a graph convolutional network (GCN) is proposed to learn the local correlations of extracted features.
arXiv Detail & Related papers (2020-02-04T13:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.