Related papers: Who is More Bayesian: Humans or ChatGPT?

Who is More Bayesian: Humans or ChatGPT?

URL: http://arxiv.org/abs/2504.10636v1
Date: Mon, 14 Apr 2025 18:37:54 GMT
Title: Who is More Bayesian: Humans or ChatGPT?
Authors: Tianshi Mu, Pranjal Rawat, John Rust, Chengjun Zhang, Qixuan Zhong,
Abstract summary: We reanalyze choices of human subjects gathered from laboratory experiments conducted by El-Gamal and Grether and Holt and Smith.<n>We confirm that while overall, Bayes Rule represents the single best model for predicting human choices, subjects are heterogeneous.<n>We show that ChatGPT is also subject to biases that result in suboptimal decisions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We compare the performance of human and artificially intelligent (AI) decision makers in simple binary classification tasks where the optimal decision rule is given by Bayes Rule. We reanalyze choices of human subjects gathered from laboratory experiments conducted by El-Gamal and Grether and Holt and Smith. We confirm that while overall, Bayes Rule represents the single best model for predicting human choices, subjects are heterogeneous and a significant share of them make suboptimal choices that reflect judgement biases described by Kahneman and Tversky that include the ``representativeness heuristic'' (excessive weight on the evidence from the sample relative to the prior) and ``conservatism'' (excessive weight on the prior relative to the sample). We compare the performance of AI subjects gathered from recent versions of large language models (LLMs) including several versions of ChatGPT. These general-purpose generative AI chatbots are not specifically trained to do well in narrow decision making tasks, but are trained instead as ``language predictors'' using a large corpus of textual data from the web. We show that ChatGPT is also subject to biases that result in suboptimal decisions. However we document a rapid evolution in the performance of ChatGPT from sub-human performance for early versions (ChatGPT 3.5) to superhuman and nearly perfect Bayesian classifications in the latest versions (ChatGPT 4o).

Related papers

Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding [1.7635992653738075]
We model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity)<n>We evaluate the best language models across 250K+ annotations from nearly 700 annotators in 100K+ texts spanning social networks, news and forums.<n>Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25% of human annotators, performing much better than average balanced accuracy.
arXiv Detail & Related papers (2025-08-19T13:05:48Z)
Using ChatGPT to Score Essays and Short-Form Constructed Responses [0.0]
Investigation focused on various prediction models, including linear regression, random forest, gradient boost, and boost. ChatGPT's performance was evaluated against human raters using quadratic weighted kappa (QWK) metrics. Study concludes that ChatGPT can complement human scoring but requires additional development to be reliable for high-stakes assessments.
arXiv Detail & Related papers (2024-08-18T16:51:28Z)
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle [61.105703857868775]
We propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback. Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference. Finally, we determine which answer better fits human preferences according to the criticism.
arXiv Detail & Related papers (2024-06-17T03:51:46Z)
Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z)
Adding guardrails to advanced chatbots [5.203329540700177]
Launch of ChatGPT in November 2022 has ushered in a new era of AI. There are already concerns that humans may be replaced by chatbots for a variety of jobs. These biases may cause significant harm and/or inequity toward different subpopulations.
arXiv Detail & Related papers (2023-06-13T02:23:04Z)
Large Language Models are not Fair Evaluators [60.27164804083752]
We find that the quality ranking of candidate responses can be easily hacked by altering their order of appearance in the context. This manipulation allows us to skew the evaluation result, making one model appear considerably superior to the other. We propose a framework with three simple yet effective strategies to mitigate this issue.
arXiv Detail & Related papers (2023-05-29T07:41:03Z)
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users. This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z)
ChatGPT: Jack of all trades, master of none [4.693597927153063]
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) We examined ChatGPT's capabilities on 25 diverse analytical NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses.
arXiv Detail & Related papers (2023-02-21T15:20:37Z)
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models [73.29106813131818]
bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing. We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes. We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
arXiv Detail & Related papers (2023-02-14T22:07:57Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text [2.0378492681344493]
We study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text. We employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text.
arXiv Detail & Related papers (2023-01-30T08:06:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.