Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
the Morphological Capabilities of a Large Language Model
- URL: http://arxiv.org/abs/2310.15113v2
- Date: Thu, 26 Oct 2023 11:45:28 GMT
- Title: Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
the Morphological Capabilities of a Large Language Model
- Authors: Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai,
Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek
Vijayakumar, Haofei Yu, Hinrich Sch\"utze, Kemal Oflazer, David R. Mortensen
- Abstract summary: Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
Here, we conduct the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages.
We find that ChatGPT massively underperforms purpose-built systems, particularly in English.
- Score: 23.60677380868016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have recently reached an impressive level of
linguistic capability, prompting comparisons with human language skills.
However, there have been relatively few systematic inquiries into the
linguistic capabilities of the latest generation of LLMs, and those studies
that do exist (i) ignore the remarkable ability of humans to generalize, (ii)
focus only on English, and (iii) investigate syntax or semantics and overlook
other capabilities that lie at the heart of human language, like morphology.
Here, we close these gaps by conducting the first rigorous analysis of the
morphological capabilities of ChatGPT in four typologically varied languages
(specifically, English, German, Tamil, and Turkish). We apply a version of
Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for
the four examined languages. We find that ChatGPT massively underperforms
purpose-built systems, particularly in English. Overall, our results -- through
the lens of morphology -- cast a new light on the linguistic capabilities of
ChatGPT, suggesting that claims of human-like language skills are premature and
misleading.
Related papers
- Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques.
Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z) - A Linguistic Comparison between Human and ChatGPT-Generated Conversations [9.022590646680095]
The research employs Linguistic Inquiry and Word Count analysis, comparing ChatGPT-generated conversations with human conversations.
Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone.
arXiv Detail & Related papers (2024-01-29T21:43:27Z) - Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability [15.274404016420737]
We study ChatGPT's (both GPT-3.5 and GPT-4) ability to identify language names and language codes.
When compared to smaller finetuned LID tools, we find that ChatGPT lags behind.
We conclude that current large language models would benefit from further development before they can sufficiently serve diverse communities.
arXiv Detail & Related papers (2023-11-16T09:12:20Z) - GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP [21.6253870440136]
This study conducts a large-scale automated and human evaluation of ChatGPT, encompassing 44 distinct language understanding and generation tasks.
Our findings indicate that, despite its remarkable performance in English, ChatGPT is consistently surpassed by smaller models that have undergone finetuning on Arabic.
arXiv Detail & Related papers (2023-05-24T10:12:39Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.