Sign Language Translation with Iterative Prototype
- URL: http://arxiv.org/abs/2308.12191v1
- Date: Wed, 23 Aug 2023 15:27:50 GMT
- Title: Sign Language Translation with Iterative Prototype
- Authors: Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li
- Abstract summary: IP-SLT is a simple yet effective framework for sign language translation (SLT)
Our idea mimics the behavior of human reading, where a sentence can be digested repeatedly, till reaching accurate understanding.
- Score: 104.76761930888604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents IP-SLT, a simple yet effective framework for sign
language translation (SLT). Our IP-SLT adopts a recurrent structure and
enhances the semantic representation (prototype) of the input sign language
video via an iterative refinement manner. Our idea mimics the behavior of human
reading, where a sentence can be digested repeatedly, till reaching accurate
understanding. Technically, IP-SLT consists of feature extraction, prototype
initialization, and iterative prototype refinement. The initialization module
generates the initial prototype based on the visual feature extracted by the
feature extraction module. Then, the iterative refinement module leverages the
cross-attention mechanism to polish the previous prototype by aggregating it
with the original video feature. Through repeated refinement, the prototype
finally converges to a more stable and accurate state, leading to a fluent and
appropriate translation. In addition, to leverage the sequential dependence of
prototypes, we further propose an iterative distillation loss to compress the
knowledge of the final iteration into previous ones. As the autoregressive
decoding process is executed only once in inference, our IP-SLT is ready to
improve various SLT systems with acceptable overhead. Extensive experiments are
conducted on public benchmarks to demonstrate the effectiveness of the IP-SLT.
Related papers
- Advancing Interpretability in Text Classification through Prototype Learning [1.9526476410335776]
ProtoLens is a prototype-based model that provides fine-grained, sub-sentence level interpretability for text classification.
ProtoLens uses a Prototype-aware Span Extraction module to identify relevant text spans.
ProtoLens provides interpretable predictions while maintaining competitive accuracy.
arXiv Detail & Related papers (2024-10-23T03:53:46Z) - Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations [8.254354613959224]
Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language.
This paper presents a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences.
arXiv Detail & Related papers (2023-09-21T15:46:01Z) - Evolving Semantic Prototype Improves Generative Zero-Shot Learning [73.07035277030573]
In zero-shot learning (ZSL), generative methods synthesize class-related sample features based on predefined semantic prototypes.
We observe that each class's predefined semantic prototype does not accurately match its real semantic prototype.
We propose a dynamic semantic prototype evolving (DSP) method to align the empirically predefined semantic prototypes and the real prototypes for class-related feature synthesis.
arXiv Detail & Related papers (2023-06-12T08:11:06Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Continuous 3D Multi-Channel Sign Language Production via Progressive
Transformers and Mixture Density Networks [37.679114155300084]
Sign Language Production (SLP) must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community.
We propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D sign pose sequences.
We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a Mixture Density Network (MDN) formulation to produce realistic and expressive sign pose sequences.
arXiv Detail & Related papers (2021-03-11T22:11:17Z) - Learning Sparse Prototypes for Text Generation [120.38555855991562]
Prototype-driven text generation is inefficient at test time as a result of needing to store and index the entire training corpus.
We propose a novel generative model that automatically learns a sparse prototype support set that achieves strong language modeling performance.
In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction.
arXiv Detail & Related papers (2020-06-29T19:41:26Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Progressive Transformers for End-to-End Sign Language Production [43.45785951443149]
The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video.
Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences.
We propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language.
arXiv Detail & Related papers (2020-04-30T15:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.