Related papers: Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning

Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning

URL: http://arxiv.org/abs/2502.20620v1
Date: Fri, 28 Feb 2025 00:57:45 GMT
Title: Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning
Authors: Ayana Niwa, Masahiro Kaneko, Kentaro Inui,
Abstract summary: We propose a method to rectify the belief space by suppressing spurious beliefs while simultaneously enhancing true ones.<n>Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations.<n>We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space.
Score: 36.74368293113009
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can exhibit advanced reasoning yet still generate incorrect answers. We hypothesize that such errors frequently stem from spurious beliefs, propositions the model internally considers true but are incorrect. To address this, we propose a method to rectify the belief space by suppressing these spurious beliefs while simultaneously enhancing true ones, thereby enabling more reliable inferences. Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations, using our Forward-Backward Beam Search (FBBS). We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space. Empirical results on multiple QA datasets and LLMs show that our method corrects previously misanswered questions without harming overall model performance. Furthermore, our approach yields improved generalization on unseen data, suggesting that rectifying a model's belief space is a promising direction for mitigating errors and enhancing overall reliability.

Related papers

When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction [24.49830646625232]
We define the behavior of acknowledging errors in previously generated answers as "retraction"<n>We demonstrate that retraction is closely tied to indicators of models' internal belief.<n>Experiments show that internal belief causally influences model retraction.
arXiv Detail & Related papers (2025-05-22T03:16:00Z)
Search-Based Correction of Reasoning Chains for Language Models [72.61861891295302]
Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs)<n>We introduce a new self-correction framework that augments each reasoning step in a CoT with a latent variable indicating its veracity.<n>We also introduce Search Corrector, a discrete search algorithm over-valued veracity assignments.
arXiv Detail & Related papers (2025-05-17T04:16:36Z)
CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs [15.170312674645535]
CRAVE is a Conflicting Reasoning Approach for explainable claim VErification. It can verify complex claims based on the conflicting rationales reasoned by large language models. CRAVE achieves much better performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-04-21T07:20:31Z)
Graph-based Confidence Calibration for Large Language Models [22.394717844099684]
We propose a novel method to develop a well-calibrated confidence estimation model. We use a weighted graph to represent the consistency among the large language models' responses to a question. We then train a graph neural network to estimate the probability of correct responses.
arXiv Detail & Related papers (2024-11-03T20:36:44Z)
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models [69.68379406317682]
We introduce a listener-aware finetuning method (LACIE) to calibrate implicit and explicit confidence markers. We show that LACIE models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We find that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers.
arXiv Detail & Related papers (2024-05-31T17:16:38Z)
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis [127.85293480405082]
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotated, flawless instruction-response pairs. This study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them.
arXiv Detail & Related papers (2023-10-16T14:59:10Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
A Belief Model for Conflicting and Uncertain Evidence -- Connecting Dempster-Shafer Theory and the Topology of Evidence [8.295493796476766]
We propose a new model for measuring degrees of beliefs based on possibly inconsistent, incomplete, and uncertain evidence. We show that computing degrees of belief with this model is #P-complete in general.
arXiv Detail & Related papers (2023-06-06T09:30:48Z)
Language Models with Rationality [57.37201135072838]
Large language models (LLMs) are proficient at question-answering (QA) It is not always clear how (or even if) an answer follows from their latent "beliefs"
arXiv Detail & Related papers (2023-05-23T17:04:25Z)
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs [76.6325846350907]
Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful.
arXiv Detail & Related papers (2021-11-26T18:33:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.