Large Language Models are biased to overestimate profoundness
- URL: http://arxiv.org/abs/2310.14422v1
- Date: Sun, 22 Oct 2023 21:33:50 GMT
- Title: Large Language Models are biased to overestimate profoundness
- Authors: Eugenio Herrera-Berg, Tom\'as Vergara Browne, Pablo Le\'on-Villagr\'a,
Marc-Llu\'is Vives, Cristian Buc Calderon
- Abstract summary: This study evaluates GPT-4 and various other large language models (LLMs) in judging the profoundness of mundane, motivational, and pseudo-profound statements.
We found a significant statement-to-statement correlation between the LLMs and humans, irrespective of the type of statements and the prompting technique used.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in natural language processing by large language models
(LLMs), such as GPT-4, have been suggested to approach Artificial General
Intelligence. And yet, it is still under dispute whether LLMs possess similar
reasoning abilities to humans. This study evaluates GPT-4 and various other
LLMs in judging the profoundness of mundane, motivational, and pseudo-profound
statements. We found a significant statement-to-statement correlation between
the LLMs and humans, irrespective of the type of statements and the prompting
technique used. However, LLMs systematically overestimate the profoundness of
nonsensical statements, with the exception of Tk-instruct, which uniquely
underestimates the profoundness of statements. Only few-shot learning prompts,
as opposed to chain-of-thought prompting, draw LLMs ratings closer to humans.
Furthermore, this work provides insights into the potential biases induced by
Reinforcement Learning from Human Feedback (RLHF), inducing an increase in the
bias to overestimate the profoundness of statements.
Related papers
- Evaluating Large language models on Understanding Korean indirect Speech acts [0.6757476692230009]
This study evaluates whether current LLMs can understand the intention of an utterance by considering the given conversational context.
proprietary models exhibited relatively higher performance compared to open-source models.
Most LLMs, except for Claude3-Opus, demonstrated significantly lower performance in understanding indirect speech acts.
arXiv Detail & Related papers (2025-02-16T04:59:19Z) - Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation [65.92001420372007]
This paper systematically evaluates state-of-the-art Multimodal Large Language Models (MLLMs) across diverse benchmarks.
We show significant performance drops when negation arguments are introduced to initially correct responses.
arXiv Detail & Related papers (2025-01-31T10:37:48Z) - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.
GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Psychometric Predictive Power of Large Language Models [32.31556074470733]
We find that instruction tuning does not always make large language models human-like from a cognitive modeling perspective.
Next-word probabilities estimated by instruction-tuned LLMs are often worse at simulating human reading behavior than those estimated by base LLMs.
arXiv Detail & Related papers (2023-11-13T17:19:14Z) - Are Large Language Models Reliable Judges? A Study on the Factuality
Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities.
This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z) - Language Models Hallucinate, but May Excel at Fact Verification [89.0833981569957]
Large language models (LLMs) frequently "hallucinate," resulting in non-factual outputs.
Even GPT-3.5 produces factual outputs less than 25% of the time.
This underscores the importance of fact verifiers in order to measure and incentivize progress.
arXiv Detail & Related papers (2023-10-23T04:39:01Z) - Verbosity Bias in Preference Labeling by Large Language Models [10.242500241407466]
We examine the biases that come along with evaluating Large Language Models (LLMs)
We take a closer look into verbosity bias -- a bias where LLMs sometimes prefer more verbose answers even if they have similar qualities.
arXiv Detail & Related papers (2023-10-16T05:19:02Z) - Probing Large Language Models from A Human Behavioral Perspective [24.109080140701188]
Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP.
The understanding of their prediction processes and internal mechanisms, such as feed-forward networks (FFN) and multi-head self-attention (MHSA) remains largely unexplored.
arXiv Detail & Related papers (2023-10-08T16:16:21Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.