An empirical analysis of phrase-based and neural machine translation
- URL: http://arxiv.org/abs/2103.03108v1
- Date: Thu, 4 Mar 2021 15:28:28 GMT
- Title: An empirical analysis of phrase-based and neural machine translation
- Authors: Hamidreza Ghader
- Abstract summary: Two popular types of machine translation (MT) are phrase-based and neural machine translation systems.
We study the behavior of important models in both phrase-based and neural MT systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Two popular types of machine translation (MT) are phrase-based and neural
machine translation systems. Both of these types of systems are composed of
multiple complex models or layers. Each of these models and layers learns
different linguistic aspects of the source language. However, for some of these
models and layers, it is not clear which linguistic phenomena are learned or
how this information is learned. For phrase-based MT systems, it is often clear
what information is learned by each model, and the question is rather how this
information is learned, especially for its phrase reordering model. For neural
machine translation systems, the situation is even more complex, since for many
cases it is not exactly clear what information is learned and how it is
learned.
To shed light on what linguistic phenomena are captured by MT systems, we
analyze the behavior of important models in both phrase-based and neural MT
systems. We consider phrase reordering models from phrase-based MT systems to
investigate which words from inside of a phrase have the biggest impact on
defining the phrase reordering behavior. Additionally, to contribute to the
interpretability of neural MT systems we study the behavior of the attention
model, which is a key component in neural MT systems and the closest model in
functionality to phrase reordering models in phrase-based systems. The
attention model together with the encoder hidden state representations form the
main components to encode source side linguistic information in neural MT. To
this end, we also analyze the information captured in the encoder hidden state
representations of a neural MT system. We investigate the extent to which
syntactic and lexical-semantic information from the source side is captured by
hidden state representations of different neural MT architectures.
Related papers
- Mapping of attention mechanisms to a generalized Potts model [50.91742043564049]
We show that training a neural network is exactly equivalent to solving the inverse Potts problem by the so-called pseudo-likelihood method.
We also compute the generalization error of self-attention in a model scenario analytically using the replica method.
arXiv Detail & Related papers (2023-04-14T16:32:56Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Lexicon Learning for Few-Shot Neural Sequence Modeling [32.49689188570872]
We present a lexical translation mechanism that generalizes existing copy mechanisms to incorporate learned, decontextualized, token-level translation rules.
It improves systematic generalization on a diverse set of sequence modeling tasks drawn from cognitive science, formal semantics, and machine translation.
arXiv Detail & Related papers (2021-06-07T22:35:04Z) - Implicit Representations of Meaning in Neural Language Models [31.71898809435222]
We identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse.
Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state.
arXiv Detail & Related papers (2021-06-01T19:23:20Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Universal Vector Neural Machine Translation With Effective Attention [0.0]
We propose a singular model for Neural Machine Translation based on encoder-decoder models.
We introduce a neutral/universal model representation that can be used to predict more than one language.
arXiv Detail & Related papers (2020-06-09T01:13:57Z) - DiscreTalk: Text-to-Speech as a Machine Translation Problem [52.33785857500754]
This paper proposes a new end-to-end text-to-speech (E2E-TTS) model based on neural machine translation (NMT)
The proposed model consists of two components; a non-autoregressive vector quantized variational autoencoder (VQ-VAE) model and an autoregressive Transformer-NMT model.
arXiv Detail & Related papers (2020-05-12T02:45:09Z) - Assessing the Bilingual Knowledge Learned by Neural Machine Translation
Models [72.56058378313963]
We bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table.
We find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples.
arXiv Detail & Related papers (2020-04-28T03:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.