Transformer based Multilingual document Embedding model
- URL: http://arxiv.org/abs/2008.08567v2
- Date: Thu, 20 Aug 2020 16:37:29 GMT
- Title: Transformer based Multilingual document Embedding model
- Authors: Wei Li and Brian Mak
- Abstract summary: This paper presents a transformer-based sentence/document embedding model, T-LASER, which makes three significant improvements.
Firstly, the BiLSTM layers is replaced by the attention-based transformer layers, which is more capable of learning sequential patterns in longer texts.
Secondly, due to the absence of recurrence, T-LASER enables faster parallel computations in the encoder to generate the text embedding.
- Score: 22.346360611417648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the current state-of-the-art multilingual document embedding model
LASER is based on the bidirectional LSTM neural machine translation model. This
paper presents a transformer-based sentence/document embedding model, T-LASER,
which makes three significant improvements. Firstly, the BiLSTM layers is
replaced by the attention-based transformer layers, which is more capable of
learning sequential patterns in longer texts. Secondly, due to the absence of
recurrence, T-LASER enables faster parallel computations in the encoder to
generate the text embedding. Thirdly, we augment the NMT translation loss
function with an additional novel distance constraint loss. This distance
constraint loss would further bring the embeddings of parallel sentences close
together in the vector space; we call the T-LASER model trained with distance
constraint, cT-LASER. Our cT-LASER model significantly outperforms both
BiLSTM-based LASER and the simpler transformer-based T-LASER.
Related papers
- Multilingual Controllable Transformer-Based Lexical Simplification [4.718531520078843]
This paper proposes mTLS, a controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model.
The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pre-trained masked language models to learn simpler alternatives for complex words.
arXiv Detail & Related papers (2023-07-05T08:48:19Z) - Efficient GPT Model Pre-training using Tensor Train Matrix
Representation [65.96485282393361]
Large-scale transformer models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch.
To reduce the number of parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Train Matrix(TTM) structure.
The resulting GPT-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model.
arXiv Detail & Related papers (2023-06-05T08:38:25Z) - TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer.
Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z) - Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners)
They incorporate a wide range of relative positional encoding mechanisms (RPEs)
These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z) - Learning Bounded Context-Free-Grammar via LSTM and the
Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks.
In practice, it is often observed that Transformer models have better representation power than LSTM.
We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR
in Transfer Learning [37.55706646713447]
We propose a hybrid Transformer-LSTM based architecture to improve low-resource end-to-end ASR.
We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text.
Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER.
arXiv Detail & Related papers (2020-05-21T00:56:42Z) - TRANS-BLSTM: Transformer with Bidirectional LSTM for Language
Understanding [18.526060699574142]
Bidirectional Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks.
We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block.
We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments.
arXiv Detail & Related papers (2020-03-16T03:38:51Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.