Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
- URL: http://arxiv.org/abs/2412.11385v1
- Date: Mon, 16 Dec 2024 02:27:59 GMT
- Title: Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
- Authors: Tom S. Juzek, Zina B. Ward,
- Abstract summary: It is widely assumed that scientists' use of large language models (LLMs) is responsible for language change.
We develop a formal, transferable method to characterize these linguistic changes.
We find 21 focal words whose increased occurrence in scientific abstracts is likely the result of LLM usage.
We assess whether reinforcement learning from human feedback contributes to the overuse of focal words.
- Score: 0.0
- License:
- Abstract: Scientific English is currently undergoing rapid change, with words like "delve," "intricate," and "underscore" appearing far more frequently than just a few years ago. It is widely assumed that scientists' use of large language models (LLMs) is responsible for such trends. We develop a formal, transferable method to characterize these linguistic changes. Application of our method yields 21 focal words whose increased occurrence in scientific abstracts is likely the result of LLM usage. We then pose "the puzzle of lexical overrepresentation": WHY are such words overused by LLMs? We fail to find evidence that lexical overrepresentation is caused by model architecture, algorithm choices, or training data. To assess whether reinforcement learning from human feedback (RLHF) contributes to the overuse of focal words, we undertake comparative model testing and conduct an exploratory online study. While the model testing is consistent with RLHF playing a role, our experimental results suggest that participants may be reacting differently to "delve" than to other focal words. With LLMs quickly becoming a driver of global language change, investigating these potential sources of lexical overrepresentation is important. We note that while insights into the workings of LLMs are within reach, a lack of transparency surrounding model development remains an obstacle to such research.
Related papers
- Why do LLaVA Vision-Language Models Reply to Images in English? [15.727116803057633]
We uncover a surprising multilingual bias occurring in a popular class of multimodal vision-language models (VLMs)
Including an image in the query to a LLaVA-style VLM significantly increases the likelihood of the model returning an English response, regardless of the language of the query.
arXiv Detail & Related papers (2024-07-02T15:01:55Z) - Delving into ChatGPT usage in academic writing through excess vocabulary [4.58733012283457]
Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance.
We study vocabulary changes in 14 million PubMed abstracts from 2010--2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words.
We show that LLMs have had an unprecedented impact on the scientific literature, surpassing the effect of major world events such as the Covid pandemic.
arXiv Detail & Related papers (2024-06-11T07:16:34Z) - Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies [47.92996085976817]
This study explores the sycophantic tendencies of Large Language Models (LLMs)
LLMs tend to provide answers that match what users want to hear, even if they are not entirely correct.
arXiv Detail & Related papers (2024-06-06T08:03:05Z) - Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs)
We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT.
We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study [3.0059120458540383]
We consider the evaluation of the lexical richness of the text generated by conversational Large Language Models (LLMs) and how it depends on the model parameters.
The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model.
arXiv Detail & Related papers (2024-02-11T13:41:17Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in
Large Language Models [4.412336603162406]
Large Language Models (LLMs) do not differentially represent numbers, which are pervasive in text.
In this work, we investigate how well popular LLMs capture the magnitudes of numbers from a behavioral lens.
arXiv Detail & Related papers (2023-05-18T07:50:44Z) - Probing Across Time: What Does RoBERTa Know and When? [70.20775905353794]
We show that linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive.
We believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.
arXiv Detail & Related papers (2021-04-16T04:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.