Measuring Bullshit in the Language Games played by ChatGPT
- URL: http://arxiv.org/abs/2411.15129v1
- Date: Fri, 22 Nov 2024 18:55:21 GMT
- Title: Measuring Bullshit in the Language Games played by ChatGPT
- Authors: Alessandro Trevisan, Harry Giddens, Sarah Dillon, Alan F. Blackwell,
- Abstract summary: Generative large language models (LLMs) create text without direct correspondence to truth value.
LLMs resemble the uses of language described in Frankfurt's popular monograph On Bullshit.
We show that a statistical model of the language of bullshit can reliably relate the Frankfurtian artificial bullshit of ChatGPT to the political and workplace functions of bullshit.
- Score: 41.94295877935867
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative large language models (LLMs), which create text without direct correspondence to truth value, are widely understood to resemble the uses of language described in Frankfurt's popular monograph On Bullshit. In this paper, we offer a rigorous investigation of this topic, identifying how the phenomenon has arisen, and how it might be analysed. In this paper, we elaborate on this argument to propose that LLM-based chatbots play the 'language game of bullshit'. We use statistical text analysis to investigate the features of this Wittgensteinian language game, based on a dataset constructed to contrast the language of 1,000 scientific publications with typical pseudo-scientific text generated by ChatGPT. We then explore whether the same language features can be detected in two well-known contexts of social dysfunction: George Orwell's critique of politics and language, and David Graeber's characterisation of bullshit jobs. Using simple hypothesis-testing methods, we demonstrate that a statistical model of the language of bullshit can reliably relate the Frankfurtian artificial bullshit of ChatGPT to the political and workplace functions of bullshit as observed in natural human language.
Related papers
- ChatGPT for President! Presupposed content in politicians versus GPT-generated texts [0.0]
This study examines ChatGPT-4's capability to replicate linguistic strategies used in political discourse.
Using a corpus-based pragmatic analysis, this study assesses how well ChatGPT can mimic these persuasive strategies.
arXiv Detail & Related papers (2025-03-03T07:48:04Z) - OffensiveLang: A Community Based Implicit Offensive Language Dataset [5.813922783967869]
Hate speech or offensive languages exist in both explicit and implicit forms.
OffensiveLang is a community based implicit offensive language dataset.
We present a prompt-based approach that effectively generates implicit offensive languages.
arXiv Detail & Related papers (2024-03-04T20:34:58Z) - Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
the Morphological Capabilities of a Large Language Model [23.60677380868016]
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
Here, we conduct the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages.
We find that ChatGPT massively underperforms purpose-built systems, particularly in English.
arXiv Detail & Related papers (2023-10-23T17:21:03Z) - What has ChatGPT read? The origins of archaeological citations used by a
generative artificial intelligence application [0.0]
This paper tested what archaeological literature appears to have been included in ChatGPT's training phase.
While ChatGPT offered seemingly pertinent references, a large percentage proved to be fictitious.
It can be shown that all references provided by ChatGPT that were found to be genuine have also been cited on Wikipedia pages.
arXiv Detail & Related papers (2023-08-07T05:06:35Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use
Case of Automatic Genre Identification [0.0]
ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end.
We compare ChatGPT with a multilingual XLM-RoBERTa language model that was fine-tuned on datasets, manually annotated with genres.
Results show that ChatGPT outperforms the fine-tuned model when applied to the dataset which was not seen before by either of the models.
arXiv Detail & Related papers (2023-03-07T14:59:33Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - The Rediscovery Hypothesis: Language Models Need to Meet Linguistics [8.293055016429863]
We study whether linguistic knowledge is a necessary condition for good performance of modern language models.
We show that language models that are significantly compressed but perform well on their pretraining objectives retain good scores when probed for linguistic structures.
This result supports the rediscovery hypothesis and leads to the second contribution of our paper: an information-theoretic framework that relates language modeling objective with linguistic information.
arXiv Detail & Related papers (2021-03-02T15:57:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.