On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation
- URL: http://arxiv.org/abs/2003.14324v1
- Date: Tue, 31 Mar 2020 16:03:38 GMT
- Title: On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation
- Authors: Eva Vanmassenhove
- Abstract summary: We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
- Score: 2.132096006921048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New machine translations (MT) technologies are emerging rapidly and with
them, bold claims of achieving human parity such as: (i) the results produced
approach "accuracy achieved by average bilingual human translators" (Wu et al.,
2017b) or (ii) the "translation quality is at human parity when compared to
professional human translators" (Hassan et al., 2018) have seen the light of
day (Laubli et al., 2018). Aside from the fact that many of these papers craft
their own definition of human parity, these sensational claims are often not
supported by a complete analysis of all aspects involved in translation.
Establishing the discrepancies between the strengths of statistical approaches
to MT and the way humans translate has been the starting point of our research.
By looking at MT output and linguistic theory, we were able to identify some
remaining issues. The problems range from simple number and gender agreement
errors to more complex phenomena such as the correct translation of aspectual
values and tenses. Our experiments confirm, along with other studies
(Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many
aspects. However, some problems remain and others have emerged. We cover a
series of problems related to the integration of specific linguistic features
into statistical and neural MT, aiming to analyse and provide a solution to
some of them. Our work focuses on addressing three main research questions that
revolve around the complex relationship between linguistics and MT in general.
We identify linguistic information that is lacking in order for automatic
translation systems to produce more accurate translations and integrate
additional features into the existing pipelines. We identify overgeneralization
or 'algorithmic bias' as a potential drawback of neural MT and link it to many
of the remaining linguistic issues.
Related papers
- Thesis proposal: Are We Losing Textual Diversity to Natural Language Processing? [3.8073142980733]
We ask whether the algorithms used in Neural Machine Translation have inherent inductive biases that are beneficial for most types of inputs but might harm the processing of untypical texts.
We conduct a series of experiments to investigate whether NMT systems struggle with maintaining the diversity of such texts.
Our ultimate goal is to develop alternatives that do not enforce uniformity in the distribution of statistical properties in the output.
arXiv Detail & Related papers (2024-09-15T01:06:07Z) - Predicting Human Translation Difficulty with Neural Machine Translation [30.036747251603668]
We evaluate the extent to which surprisal and attentional features derived from a Neural Machine Translation (NMT) model account for reading and production times of human translators.
We find that surprisal and attention are complementary predictors of translation difficulty, and that surprisal derived from a NMT model is the single most successful predictor of production duration.
arXiv Detail & Related papers (2023-12-19T04:42:56Z) - Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation [103.90963418039473]
Bi-ACL is a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model.
We show that Bi-ACL is more effective both in long-tail languages and in high-resource languages.
arXiv Detail & Related papers (2023-05-22T07:31:08Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Machine Translationese: Effects of Algorithmic Bias on Linguistic
Complexity in Machine Translation [2.0625936401496237]
We go beyond the study of gender in Machine Translation and investigate how bias amplification might affect language in a broader sense.
We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms.
arXiv Detail & Related papers (2021-01-30T18:49:11Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - Towards Multimodal Simultaneous Neural Machine Translation [28.536262015508722]
Simultaneous translation involves translating a sentence before the speaker's utterance is completed in order to realize real-time understanding.
This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding.
We propose multimodal simultaneous neural machine translation (MSNMT), which leverages visual information as an additional modality.
arXiv Detail & Related papers (2020-04-07T08:02:21Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.