Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine
- URL: http://arxiv.org/abs/2301.08745v4
- Date: Thu, 2 Nov 2023 07:19:37 GMT
- Title: Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine
- Authors: Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi,
Zhaopeng Tu
- Abstract summary: We evaluate ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness.
ChatGPT performs competitively with commercial translation products but lags behind significantly on low-resource or distant languages.
With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted.
- Score: 97.8609714773255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report provides a preliminary evaluation of ChatGPT for machine
translation, including translation prompt, multilingual translation, and
translation robustness. We adopt the prompts advised by ChatGPT to trigger its
translation ability and find that the candidate prompts generally work well
with minor performance differences. By evaluating on a number of benchmark test
sets, we find that ChatGPT performs competitively with commercial translation
products (e.g., Google Translate) on high-resource European languages but lags
behind significantly on low-resource or distant languages. As for the
translation robustness, ChatGPT does not perform as well as the commercial
systems on biomedical abstracts or Reddit comments but exhibits good results on
spoken language. Further, we explore an interesting strategy named
$\mathbf{pivot~prompting}$ for distant languages, which asks ChatGPT to
translate the source sentence into a high-resource pivot language before into
the target language, improving the translation performance noticeably. With the
launch of the GPT-4 engine, the translation performance of ChatGPT is
significantly boosted, becoming comparable to commercial translation products,
even for distant languages. Human analysis on Google Translate and ChatGPT
suggests that ChatGPT with GPT-3.5 tends to generate more hallucinations and
mis-translation errors while that with GPT-4 makes the least errors. In other
words, ChatGPT has already become a good translator. Please refer to our Github
project for more details:
https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator
Related papers
- Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability [15.274404016420737]
We study ChatGPT's (both GPT-3.5 and GPT-4) ability to identify language names and language codes.
When compared to smaller finetuned LID tools, we find that ChatGPT lags behind.
We conclude that current large language models would benefit from further development before they can sufficiently serve diverse communities.
arXiv Detail & Related papers (2023-11-16T09:12:20Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP [21.6253870440136]
This study conducts a large-scale automated and human evaluation of ChatGPT, encompassing 44 distinct language understanding and generation tasks.
Our findings indicate that, despite its remarkable performance in English, ChatGPT is consistently surpassed by smaller models that have undergone finetuning on Arabic.
arXiv Detail & Related papers (2023-05-24T10:12:39Z) - Phoenix: Democratizing ChatGPT across Languages [68.75163236421352]
We release a large language model "Phoenix", achieving competitive performance among open-source English and Chinese models.
We believe this work will be beneficial to make ChatGPT more accessible, especially in countries where people cannot use ChatGPT due to restrictions from OpenAI or local goverments.
arXiv Detail & Related papers (2023-04-20T16:50:04Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - How to Design Translation Prompts for ChatGPT: An Empirical Study [18.678893287863033]
ChatGPT has demonstrated surprising abilities in natural language understanding and natural language generation.
We adopt several translation prompts on a wide range of translations.
Our work provides empirical evidence that ChatGPT still has great potential in translations.
arXiv Detail & Related papers (2023-04-05T01:17:59Z) - Towards Making the Most of ChatGPT for Machine Translation [75.576405098545]
ChatGPT shows remarkable capabilities for machine translation (MT)
Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages.
arXiv Detail & Related papers (2023-03-24T03:35:21Z) - ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction
Benchmark [11.36853733574956]
ChatGPT is a cutting-edge artificial intelligence language model developed by OpenAI.
We compare it with commercial GEC product (e.g., Grammarly) and state-of-the-art models (e.g., GECToR)
We find that ChatGPT performs not as well as those baselines in terms of the automatic evaluation metrics.
arXiv Detail & Related papers (2023-03-15T00:35:50Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.