Related papers: Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4 mirroring math anxiety in high-school students

Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4 mirroring math anxiety in high-school students

URL: http://arxiv.org/abs/2305.18320v1
Date: Mon, 22 May 2023 15:06:51 GMT
Title: Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4 mirroring math anxiety in high-school students
Authors: Katherine Abramski, Salvatore Citraro, Luigi Lombardi, Giulio Rossetti, and Massimo Stella
Abstract summary: We investigate perceptions of math and STEM fields provided by cutting-edge language models, namely GPT-3, Chat-GPT, and GPT-4. Our findings indicate that LLMs have an overall negative perception of math and STEM fields, with math being perceived most negatively. We observe that newer versions (i.e. GPT-4) produce richer, more complex perceptions as well as less negative perceptions compared to older versions and N=159 high-school students.
Score: 0.3131740922192114
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. Here, we investigate perceptions of math and STEM fields provided by cutting-edge language models, namely GPT-3, Chat-GPT, and GPT-4, by applying an approach from network science and cognitive psychology. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have an overall negative perception of math and STEM fields, with math being perceived most negatively. We observe significant differences across the three LLMs. We observe that newer versions (i.e. GPT-4) produce richer, more complex perceptions as well as less negative perceptions compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them.

Related papers

Math anxiety and associative knowledge structure are entwined in psychology students but not in Large Language Models like GPT-3.5 and GPT-4o [10.71149623650681]
This study employs a framework based on behavioural forma mentis networks to explore individual and group differences in the perception and association of concepts related to math and anxiety.<n>Experiments 1, 2, and 3 employ individual-level network features to predict psychometric scores for math anxiety.<n>Experiment 4 focuses on group-level perceptions extracted from human students, GPT-3.5 and GPT-4o's networks.
arXiv Detail & Related papers (2025-11-03T13:25:11Z)
On the Thinking-Language Modeling Gap in Large Language Models [68.83670974539108]
We show that there is a significant gap between the modeling of languages and thoughts.<n>We propose a new prompt technique termed Language-of-Thoughts (LoT) to demonstrate and alleviate this gap.
arXiv Detail & Related papers (2025-05-19T09:31:52Z)
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs [62.875934732547435]
Current large language models (MLLMs) often underperform on mathematical problem-solving tasks that require fine-grained visual understanding. In this paper, we evaluate the visual grounding capabilities of state-of-the-art MLLMs and reveal a significant negative correlation between visual grounding accuracy and problem-solving performance. We propose a novel approach, SVE-Math, featuring a geometric-grounded vision encoder and a feature router that dynamically adjusts the contribution of hierarchical visual feature maps.
arXiv Detail & Related papers (2025-01-11T04:08:44Z)
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology [13.964263002704582]
We show that, even with the use of Chains of Thought prompts, mainstream LLMs have a high error rate when solving modified CRT problems. Specifically, the average accuracy rate dropped by up to 50% compared to the original questions. This finding challenges the belief that LLMs have genuine mathematical reasoning abilities comparable to humans.
arXiv Detail & Related papers (2024-10-19T05:01:56Z)
A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition [0.6138671548064355]
Large Language Models (LLMs) are known for their remarkable ability to generate 'knowledge' However, there is a huge gap between LLM's and human capabilities for understanding abstract concepts and reasoning. We discuss these issues in a larger philosophical context of human knowledge acquisition and the Turing test.
arXiv Detail & Related papers (2024-08-13T03:25:49Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z)
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments. We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z)
StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models [11.218531873222398]
Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data. We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society.
arXiv Detail & Related papers (2023-10-20T17:22:30Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language Models -- and Disappeared in GPT-4 [0.0]
We show that large language models (LLMs) exhibit behavior that resembles human-like intuition. We also probe how sturdy the inclination for intuitive-like decision-making is.
arXiv Detail & Related papers (2023-06-13T08:43:13Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs) We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics. We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z)
Thinking Fast and Slow in Large Language Models [0.08057006406834465]
Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. In this study, we show that LLMs like GPT-3 exhibit behavior that resembles human-like intuition - and the cognitive errors that come with it.
arXiv Detail & Related papers (2022-12-10T05:07:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.