Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4
mirroring math anxiety in high-school students
- URL: http://arxiv.org/abs/2305.18320v1
- Date: Mon, 22 May 2023 15:06:51 GMT
- Title: Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4
mirroring math anxiety in high-school students
- Authors: Katherine Abramski, Salvatore Citraro, Luigi Lombardi, Giulio
Rossetti, and Massimo Stella
- Abstract summary: We investigate perceptions of math and STEM fields provided by cutting-edge language models, namely GPT-3, Chat-GPT, and GPT-4.
Our findings indicate that LLMs have an overall negative perception of math and STEM fields, with math being perceived most negatively.
We observe that newer versions (i.e. GPT-4) produce richer, more complex perceptions as well as less negative perceptions compared to older versions and N=159 high-school students.
- Score: 0.3131740922192114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models are becoming increasingly integrated into our lives.
Hence, it is important to understand the biases present in their outputs in
order to avoid perpetuating harmful stereotypes, which originate in our own
flawed ways of thinking. This challenge requires developing new benchmarks and
methods for quantifying affective and semantic bias, keeping in mind that LLMs
act as psycho-social mirrors that reflect the views and tendencies that are
prevalent in society. One such tendency that has harmful negative effects is
the global phenomenon of anxiety toward math and STEM subjects. Here, we
investigate perceptions of math and STEM fields provided by cutting-edge
language models, namely GPT-3, Chat-GPT, and GPT-4, by applying an approach
from network science and cognitive psychology. Specifically, we use behavioral
forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM
disciplines in relation to other concepts. We use data obtained by probing the
three LLMs in a language generation task that has previously been applied to
humans. Our findings indicate that LLMs have an overall negative perception of
math and STEM fields, with math being perceived most negatively. We observe
significant differences across the three LLMs. We observe that newer versions
(i.e. GPT-4) produce richer, more complex perceptions as well as less negative
perceptions compared to older versions and N=159 high-school students. These
findings suggest that advances in the architecture of LLMs may lead to
increasingly less biased models that could even perhaps someday aid in reducing
harmful stereotypes in society rather than perpetuating them.
Related papers
- Math anxiety and associative knowledge structure are entwined in psychology students but not in Large Language Models like GPT-3.5 and GPT-4o [10.71149623650681]
This study employs a framework based on behavioural forma mentis networks to explore individual and group differences in the perception and association of concepts related to math and anxiety.<n>Experiments 1, 2, and 3 employ individual-level network features to predict psychometric scores for math anxiety.<n>Experiment 4 focuses on group-level perceptions extracted from human students, GPT-3.5 and GPT-4o's networks.
arXiv Detail & Related papers (2025-11-03T13:25:11Z) - On the Thinking-Language Modeling Gap in Large Language Models [68.83670974539108]
We show that there is a significant gap between the modeling of languages and thoughts.<n>We propose a new prompt technique termed Language-of-Thoughts (LoT) to demonstrate and alleviate this gap.
arXiv Detail & Related papers (2025-05-19T09:31:52Z) - Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs [62.875934732547435]
Current large language models (MLLMs) often underperform on mathematical problem-solving tasks that require fine-grained visual understanding.
In this paper, we evaluate the visual grounding capabilities of state-of-the-art MLLMs and reveal a significant negative correlation between visual grounding accuracy and problem-solving performance.
We propose a novel approach, SVE-Math, featuring a geometric-grounded vision encoder and a feature router that dynamically adjusts the contribution of hierarchical visual feature maps.
arXiv Detail & Related papers (2025-01-11T04:08:44Z) - Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology [13.964263002704582]
We show that, even with the use of Chains of Thought prompts, mainstream LLMs have a high error rate when solving modified CRT problems.
Specifically, the average accuracy rate dropped by up to 50% compared to the original questions.
This finding challenges the belief that LLMs have genuine mathematical reasoning abilities comparable to humans.
arXiv Detail & Related papers (2024-10-19T05:01:56Z) - A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition [0.6138671548064355]
Large Language Models (LLMs) are known for their remarkable ability to generate 'knowledge'
However, there is a huge gap between LLM's and human capabilities for understanding abstract concepts and reasoning.
We discuss these issues in a larger philosophical context of human knowledge acquisition and the Turing test.
arXiv Detail & Related papers (2024-08-13T03:25:49Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses.
We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks.
Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large
Language Models [11.218531873222398]
Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data.
We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society.
arXiv Detail & Related papers (2023-10-20T17:22:30Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z) - Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language
Models -- and Disappeared in GPT-4 [0.0]
We show that large language models (LLMs) exhibit behavior that resembles human-like intuition.
We also probe how sturdy the inclination for intuitive-like decision-making is.
arXiv Detail & Related papers (2023-06-13T08:43:13Z) - Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs)
We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics.
We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z) - Thinking Fast and Slow in Large Language Models [0.08057006406834465]
Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life.
In this study, we show that LLMs like GPT-3 exhibit behavior that resembles human-like intuition - and the cognitive errors that come with it.
arXiv Detail & Related papers (2022-12-10T05:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.