Do GPTs Produce Less Literal Translations?
- URL: http://arxiv.org/abs/2305.16806v4
- Date: Tue, 6 Jun 2023 03:15:43 GMT
- Title: Do GPTs Produce Less Literal Translations?
- Authors: Vikas Raunak, Arul Menezes, Matt Post, Hany Hassan Awadalla
- Abstract summary: Large Language Models (LLMs) have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks.
We find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on Machine Translation quality metrics.
- Score: 20.095646048167612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose
language models capable of addressing many natural language generation or
understanding tasks. On the task of Machine Translation (MT), multiple works
have investigated few-shot prompting mechanisms to elicit better translations
from LLMs. However, there has been relatively little investigation on how such
translations differ qualitatively from the translations generated by standard
Neural Machine Translation (NMT) models. In this work, we investigate these
differences in terms of the literalness of translations produced by the two
systems. Using literalness measures involving word alignment and monotonicity,
we find that translations out of English (E-X) from GPTs tend to be less
literal, while exhibiting similar or better scores on MT quality metrics. We
demonstrate that this finding is borne out in human evaluations as well. We
then show that these differences are especially pronounced when translating
sentences that contain idiomatic expressions.
Related papers
- IntGrad MT: Eliciting LLMs' Machine Translation Capabilities with Sentence Interpolation and Gradual MT [5.323504404265276]
Large Language Models (LLMs) have demonstrated strong performance in translation without needing to be finetuned on additional parallel corpora.
Previous works have focused on mitigating this issue by leveraging relevant few-shot examples or external resources such as dictionaries or grammar books.
We propose a novel method named IntGrad MT that focuses on fully exploiting an LLM's inherent translation capability.
arXiv Detail & Related papers (2024-10-15T15:26:28Z) - The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities [18.175795328685986]
Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality.
We perform an extensive translation evaluation on the LLaMA and Falcon family of models with model size ranging from 7 billion up to 65 billion parameters.
We observe a decline in the ability to perform formality steering, to produce technical translations through few-shot examples, and to perform document-level translation.
arXiv Detail & Related papers (2024-05-30T14:25:56Z) - Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Translating the Unseen? Yor\`ub\'a $\rightarrow$ English MT in
Low-Resource, Morphologically-Unmarked Settings [8.006185289499049]
Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation.
In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems when translating bare nouns in Yorub'a into English.
arXiv Detail & Related papers (2021-03-07T01:24:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.