ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning
- URL: http://arxiv.org/abs/2304.05613v1
- Date: Wed, 12 Apr 2023 05:08:52 GMT
- Title: ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning
- Authors: Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man,
Franck Dernoncourt, Trung Bui, Thien Huu Nguyen
- Abstract summary: Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
- Score: 70.57126720079971
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last few years, large language models (LLMs) have emerged as the
most important breakthroughs in natural language processing (NLP) that
fundamentally transform research and developments in the field. ChatGPT
represents one of the most exciting LLM systems developed recently to showcase
impressive skills for language generation and highly attract public attention.
Among various exciting applications discovered for ChatGPT in English, the
model can process and generate texts for multiple languages due to its
multilingual training data. Given the broad adoption of ChatGPT for English in
different problems and areas, a natural question is whether ChatGPT can also be
applied effectively for other languages or it is necessary to develop more
language-specific technologies. The answer to this question requires a thorough
evaluation of ChatGPT over multiple tasks with diverse languages and large
datasets (i.e., beyond reported anecdotes), which is still missing or limited
in current research. Our work aims to fill this gap for the evaluation of
ChatGPT and similar LLMs to provide more comprehensive information for
multilingual NLP applications. While this work will be an ongoing effort to
include additional experiments in the future, our current paper evaluates
ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium,
low, and extremely low resources. We also focus on the zero-shot learning
setting for ChatGPT to improve reproducibility and better simulate the
interactions of general users. Compared to the performance of previous models,
our extensive experimental results demonstrate a worse performance of ChatGPT
for different NLP tasks and languages, calling for further research to develop
better models and understanding for multilingual learning.
Related papers
- Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
the Morphological Capabilities of a Large Language Model [23.60677380868016]
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
Here, we conduct the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages.
We find that ChatGPT massively underperforms purpose-built systems, particularly in English.
arXiv Detail & Related papers (2023-10-23T17:21:03Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing [4.13365552362244]
ChatGPT has been successfully applied in numerous areas, including chatbots, content generation, language translation, personalized recommendations, and even medical diagnosis and treatment.
Its success in these applications can be attributed to its ability to generate human-like responses, understand natural language, and adapt to different contexts.
This article provides a comprehensive overview of ChatGPT, its applications, advantages, and limitations.
arXiv Detail & Related papers (2023-03-27T21:27:58Z) - Towards Making the Most of ChatGPT for Machine Translation [75.576405098545]
ChatGPT shows remarkable capabilities for machine translation (MT)
Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages.
arXiv Detail & Related papers (2023-03-24T03:35:21Z) - ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use
Case of Automatic Genre Identification [0.0]
ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end.
We compare ChatGPT with a multilingual XLM-RoBERTa language model that was fine-tuned on datasets, manually annotated with genres.
Results show that ChatGPT outperforms the fine-tuned model when applied to the dataset which was not seen before by either of the models.
arXiv Detail & Related papers (2023-03-07T14:59:33Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.