Related papers: Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making

Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making

URL: http://arxiv.org/abs/2507.10124v1
Date: Mon, 14 Jul 2025 10:09:46 GMT
Title: Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
Authors: Thomas T. Hills,
Abstract summary: Metacognitive prompts are designed to bring latent knowledge into awareness during decision making.<n>"Could you be wrong?" prompts the LLM to identify its own biases and produce cogent metacognitive reflection.<n>This work argues that human psychology offers a new avenue for prompt engineering.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Identifying bias in LLMs is ongoing. Because they are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing that will outlive current models. Strategies developed for debiasing human decision making offer one promising approach as they incorporate an LLM-style prompt intervention designed to bring latent knowledge into awareness during decision making. LLMs trained on vast amounts of information contain information about potential biases, counter-arguments, and contradictory evidence, but that information may only be brought to bear if prompted. Metacognitive prompts developed in the human decision making literature are designed to achieve this, and as I demonstrate here, they show promise with LLMs. The prompt I focus on here is "could you be wrong?" Following an LLM response, this prompt leads LLMs to produce additional information, including why they answered as they did, errors, biases, contradictory evidence, and alternatives, none of which were apparent in their initial response. Indeed, this metaknowledge often reveals that how LLMs and users interpret prompts are not aligned. Here I demonstrate this prompt using a set of questions taken from recent articles about LLM biases, including implicit discriminatory biases and failures of metacognition. "Could you be wrong" prompts the LLM to identify its own biases and produce cogent metacognitive reflection. I also present another example involving convincing but incomplete information, which is readily corrected by the metacognitive prompt. In sum, this work argues that human psychology offers a new avenue for prompt engineering, leveraging a long history of effective prompt-based improvements to human decision making.

Related papers

An Empirical Analysis of LLMs for Countering Misinformation [4.832131829290864]
Large Language Models (LLMs) can amplify online misinformation, but show promise in tackling misinformation.<n>We empirically study the capabilities of three LLMs -- ChatGPT, Gemini, and Claude -- in countering political misinformation.<n>Our findings suggest that models struggle to ground their responses in real news sources, and tend to prefer citing left-leaning sources.
arXiv Detail & Related papers (2025-02-28T07:12:03Z)
Understanding the Dark Side of LLMs' Intrinsic Self-Correction [55.51468462722138]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability.<n>Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts.<n>We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.<n>This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.<n>We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
LLM-Generated Black-box Explanations Can Be Adversarially Helpful [16.49758711633611]
Large Language Models (LLMs) help us solve and understand complex problems by acting as digital assistants. Our research uncovers a hidden risk tied to this approach, which we call *adversarial helpfulness*. This happens when an LLM's explanations make a wrong answer look right, potentially leading people to trust incorrect solutions.
arXiv Detail & Related papers (2024-05-10T20:23:46Z)
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model. We employ a proxy model which has far fewer parameters, and take its answers as answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z)
When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation [66.01754585188739]
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence.
arXiv Detail & Related papers (2024-02-18T04:57:19Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism [0.0]
Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities. These models are not flawless and often produce responses that contain errors or misinformation. We propose a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors.
arXiv Detail & Related papers (2023-11-02T07:20:49Z)
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method [36.24876571343749]
Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. Recent literature reveals that LLMs generate nonfactual responses intermittently. We propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results.
arXiv Detail & Related papers (2023-10-27T06:22:14Z)
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation [80.126717170151]
This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. We introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information.
arXiv Detail & Related papers (2023-10-02T16:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.