Related papers: How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

URL: http://arxiv.org/abs/2302.09210v1
Date: Sat, 18 Feb 2023 02:11:36 GMT
Title: How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation
Authors: Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla
Abstract summary: We show that GPT models achieve very competitive translation quality for high resource languages. We also show that hybrid approaches, which combine GPT models with other translation systems, can further enhance the translation quality.
Score: 16.90012234231392
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for natural language generation, but their performance for machine translation has not been thoroughly investigated. In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art research and commercial systems, effect of prompting strategies, robustness towards domain shifts and document-level translation. We experiment with eighteen different translation directions involving high and low resource languages, as well as non English-centric translations, and evaluate the performance of three GPT models: ChatGPT, GPT3.5 (text-davinci-003), and text-davinci-002. Our results show that GPT models achieve very competitive translation quality for high resource languages, while having limited capabilities for low resource languages. We also show that hybrid approaches, which combine GPT models with other translation systems, can further enhance the translation quality. We perform comprehensive analysis and human evaluation to further understand the characteristics of GPT translations. We hope that our paper provides valuable insights for researchers and practitioners in the field and helps to better understand the potential and limitations of GPT models for translation.

Related papers

Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data [64.4458540273004]
We propose a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of Large Language Models (LLMs) Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions.
arXiv Detail & Related papers (2025-04-20T16:20:30Z)
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels [20.05501751993599]
GPT-4 achieves performance comparable to junior-level translators in terms of total errors. Unlike traditional Neural Machine Translation systems, GPT-4 maintains consistent translation quality across all evaluated language pairs.
arXiv Detail & Related papers (2024-11-21T01:12:46Z)
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [56.7988577327046]
We introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company. Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.
arXiv Detail & Related papers (2024-05-20T05:55:08Z)
Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation [64.5862977630713]
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task. We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive.
arXiv Detail & Related papers (2024-01-12T13:23:21Z)
Do GPTs Produce Less Literal Translations? [20.095646048167612]
Large Language Models (LLMs) have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. We find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on Machine Translation quality metrics.
arXiv Detail & Related papers (2023-05-26T10:38:31Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
How to Design Translation Prompts for ChatGPT: An Empirical Study [18.678893287863033]
ChatGPT has demonstrated surprising abilities in natural language understanding and natural language generation. We adopt several translation prompts on a wide range of translations. Our work provides empirical evidence that ChatGPT still has great potential in translations.
arXiv Detail & Related papers (2023-04-05T01:17:59Z)
Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction [28.58384091374763]
GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks. We perform experiments testing the capabilities of a GPT-3.5 model (text-davinci-003) and a GPT-4 model (gpt-4-0314) on major GEC benchmarks. We report the performance of our best prompt on the BEA-2019 and JFLEG datasets, finding that the GPT models can perform well in a sentence-level revision setting.
arXiv Detail & Related papers (2023-03-25T03:08:49Z)
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities. We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models. We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets. Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z)
Large Language Models Are State-of-the-Art Evaluators of Translation Quality [7.818228526742237]
GEMBA is a GPT-based metric for assessment of translation quality. We investigate nine versions of GPT models, including ChatGPT and GPT-4. Our method achieves state-of-the-art accuracy in both modes when compared to MQM-based human labels.
arXiv Detail & Related papers (2023-02-28T12:23:48Z)
News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z)
Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge. We finetune smaller language models to generate useful intermediate context, referred to here as elaborations. Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.