Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople
- URL: http://arxiv.org/abs/2406.11116v1
- Date: Mon, 17 Jun 2024 00:23:16 GMT
- Title: Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople
- Authors: Zhuang Qiu, Xufeng Duan, Zhenguang G. Cai,
- Abstract summary: This study builds upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena.
Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions.
Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.
Related papers
- Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans [1.8434042562191815]
This work investigates the role of model scaling in determining whether differences between humans and models are amenable to model size.
We test three Large Language Models (LLMs) on a grammaticality judgment task featuring anaphora, center embedding, comparatives, and negative polarity.
We find that humans are overall less accurate than ChatGPT-4 (76% vs. 80% accuracy, respectively), but that this is due to ChatGPT-4 outperforming humans only in one task condition, namely on grammatical sentences.
arXiv Detail & Related papers (2024-04-23T10:09:46Z) - Towards a Psychology of Machines: Large Language Models Predict Human Memory [0.0]
Large language models (LLMs) are excelling across various tasks despite not being based on human cognition.
This study examines ChatGPT's ability to predict human performance in a language-based memory task.
arXiv Detail & Related papers (2024-03-08T08:41:14Z) - A Linguistic Comparison between Human and ChatGPT-Generated Conversations [9.022590646680095]
The research employs Linguistic Inquiry and Word Count analysis, comparing ChatGPT-generated conversations with human conversations.
Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone.
arXiv Detail & Related papers (2024-01-29T21:43:27Z) - Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
the Morphological Capabilities of a Large Language Model [23.60677380868016]
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
Here, we conduct the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages.
We find that ChatGPT massively underperforms purpose-built systems, particularly in English.
arXiv Detail & Related papers (2023-10-23T17:21:03Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - Emergence of a phonological bias in ChatGPT [0.0]
I demonstrate that ChatGPT displays phonological biases that are a hallmark of human language processing.
ChatGPT has a tendency to use consonants over vowels to identify words.
This is observed across languages that differ in their relative distribution of consonants and vowels such as English and Spanish.
arXiv Detail & Related papers (2023-05-25T10:57:43Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - The Secret is in the Spectra: Predicting Cross-lingual Task Performance
with Spectral Similarity Measures [83.53361353172261]
We present a large-scale study focused on the correlations between monolingual embedding space similarity and task performance.
We introduce several isomorphism measures between two embedding spaces, based on the relevant statistics of their individual spectra.
We empirically show that 1) language similarity scores derived from such spectral isomorphism measures are strongly associated with performance observed in different cross-lingual tasks.
arXiv Detail & Related papers (2020-01-30T00:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.