Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning
Ability and Human-like Biases
- URL: http://arxiv.org/abs/2306.12567v1
- Date: Wed, 21 Jun 2023 21:04:11 GMT
- Title: Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning
Ability and Human-like Biases
- Authors: Risako Ando, Takanobu Morishita, Hirohiko Abe, Koji Mineshima,
Mitsuhiro Okada
- Abstract summary: This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans.
We focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction.
We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects.
- Score: 8.583432139919616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates whether current large language models exhibit biases
in logical reasoning, similar to humans. Specifically, we focus on syllogistic
reasoning, a well-studied form of inference in the cognitive science of human
deduction. To facilitate our analysis, we introduce a dataset called NeuBAROCO,
originally designed for psychological experiments that assess human logical
abilities in syllogistic reasoning. The dataset consists of syllogistic
inferences in both English and Japanese. We examine three types of biases
observed in human syllogistic reasoning: belief biases, conversion errors, and
atmosphere effects. Our findings demonstrate that current large language models
struggle more with problems involving these three types of biases.
Related papers
- Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)
Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities.
We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset [5.695579108997392]
This paper explores the question of how accurately current large language models can perform logical reasoning in natural language.
We present a syllogism dataset called NeuBAROCO, which consists of syllogistic reasoning problems in English and Japanese.
Our experiments with leading large language models indicate that these models exhibit reasoning biases similar to humans, along with other error tendencies.
arXiv Detail & Related papers (2024-08-08T12:10:50Z) - Cognitive bias in large language models: Cautious optimism meets
anti-Panglossian meliorism [0.0]
Traditional discussions of bias in large language models focus on a conception of bias closely tied to unfairness.
Recent work raises the novel possibility of assessing the outputs of large language models for a range of cognitive biases.
I draw out philosophical implications of this discussion for the rationality of human cognitive biases as well as the role of unrepresentative data in driving model biases.
arXiv Detail & Related papers (2023-11-18T01:58:23Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models [39.77445889769015]
We show that, within the PaLM2 family of language models, larger models are more logical than smaller ones.
Even the largest models make systematic errors, some of which mirror human reasoning biases.
Overall, we find that language models often mimic the human biases included in their training data, but are able to overcome them in some cases.
arXiv Detail & Related papers (2023-11-01T11:13:06Z) - Using Artificial Populations to Study Psychological Phenomena in Neural
Models [0.0]
Investigation of cognitive behavior in language models must be conducted in an appropriate population for the results to be meaningful.
We leverage work in uncertainty estimation in a novel approach to efficiently construct experimental populations.
We provide theoretical grounding in the uncertainty estimation literature and motivation from current cognitive work regarding language models.
arXiv Detail & Related papers (2023-08-15T20:47:51Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z) - Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z) - Towards an Enhanced Understanding of Bias in Pre-trained Neural Language
Models: A Survey with Special Emphasis on Affective Bias [2.6304695993930594]
We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur, and various ways in which these biases could be quantified and mitigated.
Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias.
We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models.
arXiv Detail & Related papers (2022-04-21T18:51:19Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.