Stochastic Transformer Networks with Linear Competing Units: Application
to end-to-end SL Translation
- URL: http://arxiv.org/abs/2109.13318v1
- Date: Wed, 1 Sep 2021 15:00:52 GMT
- Title: Stochastic Transformer Networks with Linear Competing Units: Application
to end-to-end SL Translation
- Authors: Andreas Voskou, Konstantinos P. Panousis, Dimitrios Kosmopoulos,
Dimitris N. Metaxas and Sotirios Chatzis
- Abstract summary: We introduce an end-to-end SLT model that does not entail explicit use of glosses.
This is in stark contrast to existing end-to-end models that use gloss sequence groundtruth.
We demonstrate that our approach can reach the currently best reported BLEU-4 score on the ENIX 2014T benchmark.
- Score: 46.733644368276764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automating sign language translation (SLT) is a challenging real world
application. Despite its societal importance, though, research progress in the
field remains rather poor. Crucially, existing methods that yield viable
performance necessitate the availability of laborious to obtain gloss sequence
groundtruth. In this paper, we attenuate this need, by introducing an
end-to-end SLT model that does not entail explicit use of glosses; the model
only needs text groundtruth. This is in stark contrast to existing end-to-end
models that use gloss sequence groundtruth, either in the form of a modality
that is recognized at an intermediate model stage, or in the form of a parallel
output process, jointly trained with the SLT model. Our approach constitutes a
Transformer network with a novel type of layers that combines: (i) local
winner-takes-all (LWTA) layers with stochastic winner sampling, instead of
conventional ReLU layers, (ii) stochastic weights with posterior distributions
estimated via variational inference, and (iii) a weight compression technique
at inference time that exploits estimated posterior variance to perform
massive, almost lossless compression. We demonstrate that our approach can
reach the currently best reported BLEU-4 score on the PHOENIX 2014T benchmark,
but without making use of glosses for model training, and with a memory
footprint reduced by more than 70%.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning.
We exploit Variational Autoencoders to learn class-conditioned distributions.
We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z) - Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
We present a theoretically-justified technique termed Low-Rank Induced Training (LoRITa)
LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation.
We demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and ImageNet on Convolutional Neural Networks.
arXiv Detail & Related papers (2024-05-06T00:58:23Z) - LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
Relevance Propagation [0.0]
We introduce LRP-QViT, an explainability-based method for assigning mixed-precision bit allocations to different layers based on their importance during classification.
Our experimental findings demonstrate that both our fixed-bit and mixed-bit post-training quantization methods surpass existing models in the context of 4-bit and 6-bit quantization.
arXiv Detail & Related papers (2024-01-20T14:53:19Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers [20.23085795744602]
We propose textbfAdaptive textbfSparsity textbfLevel (textbfPALS) to automatically seek a decent balance between loss and sparsity.
PALS draws inspiration from sparse training and during-training methods.
It introduces the novel "expand" mechanism in training sparse neural networks, allowing the model to dynamically shrink, expand, or remain stable to find a proper sparsity level.
arXiv Detail & Related papers (2023-05-28T06:57:27Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework.
For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement.
For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z) - Regularization via Adaptive Pairwise Label Smoothing [19.252319300590653]
This paper introduces a novel label smoothing technique called Pairwise Label Smoothing (PLS)
Unlike current LS methods, which typically require to find a global smoothing distribution mass through cross-validation search, PLS automatically learns the distribution mass for each input pair during training.
We empirically show that PLS significantly outperforms LS and the baseline models, achieving up to 30% of relative classification error reduction.
arXiv Detail & Related papers (2020-12-02T22:08:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.