The Confidence-Competence Gap in Large Language Models: A Cognitive
Study
- URL: http://arxiv.org/abs/2309.16145v1
- Date: Thu, 28 Sep 2023 03:50:09 GMT
- Title: The Confidence-Competence Gap in Large Language Models: A Cognitive
Study
- Authors: Aniket Kumar Singh, Suman Devkota, Bishal Lamichhane, Uttam Dhakal,
Chandra Dhakal
- Abstract summary: Large Language Models (LLMs) have acquired ubiquitous attention for their performances across diverse domains.
We exploit these models with diverse sets of questionnaires and real-world scenarios.
Our findings reveal intriguing instances where models demonstrate high confidence even when they answer incorrectly.
- Score: 3.757390057317548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have acquired ubiquitous attention for their
performances across diverse domains. Our study here searches through LLMs'
cognitive abilities and confidence dynamics. We dive deep into understanding
the alignment between their self-assessed confidence and actual performance. We
exploit these models with diverse sets of questionnaires and real-world
scenarios and extract how LLMs exhibit confidence in their responses. Our
findings reveal intriguing instances where models demonstrate high confidence
even when they answer incorrectly. This is reminiscent of the Dunning-Kruger
effect observed in human psychology. In contrast, there are cases where models
exhibit low confidence with correct answers revealing potential underestimation
biases. Our results underscore the need for a deeper understanding of their
cognitive processes. By examining the nuances of LLMs' self-assessment
mechanism, this investigation provides noteworthy revelations that serve to
advance the functionalities and broaden the potential applications of these
formidable language models.
Related papers
- Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? [26.69630281310365]
Large language models (LLMs) have been found to produce hallucinations when the question exceeds their internal knowledge boundaries.
Existing research on LLMs' perception of their knowledge boundaries typically uses either the probability of the generated tokens or the verbalized confidence as the model's confidence in its response.
arXiv Detail & Related papers (2024-08-19T08:01:11Z) - Self-Cognition in Large Language Models: An Exploratory Study [77.47074736857726]
This paper performs a pioneering study to explore self-cognition in Large Language Models (LLMs)
We first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition.
We observe a positive correlation between model size, training data quality, and self-cognition level.
arXiv Detail & Related papers (2024-07-01T17:52:05Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Is Cognition and Action Consistent or Not: Investigating Large Language
Model's Personality [12.162460438332152]
We investigate the reliability of Large Language Models (LLMs) in professing human-like personality traits through responses to personality questionnaires.
Our goal is to evaluate the consistency between LLMs' professed personality inclinations and their actual "behavior"
We propose hypotheses for the observed results based on psychological theories and metrics.
arXiv Detail & Related papers (2024-02-22T16:32:08Z) - Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models [23.42725642076256]
Large Language Models (LLMs) have catalyzed an increasing interest in their self-correction capabilities.
This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs.
We develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence"
arXiv Detail & Related papers (2024-02-19T21:38:02Z) - Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge.
We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality.
We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z) - The Calibration Gap between Model and Human Confidence in Large Language
Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct.
Recent work has focused on the quality of internal LLM confidence assessments.
This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z) - Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions.
CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z) - Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning.
This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation.
We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z) - Do Large Language Models Know What They Don't Know? [74.65014158544011]
Large language models (LLMs) have a wealth of knowledge that allows them to excel in various Natural Language Processing (NLP) tasks.
Despite their vast knowledge, LLMs are still limited by the amount of information they can accommodate and comprehend.
This study aims to evaluate LLMs' self-knowledge by assessing their ability to identify unanswerable or unknowable questions.
arXiv Detail & Related papers (2023-05-29T15:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.