TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
- URL: http://arxiv.org/abs/2402.17811v2
- Date: Wed, 5 Jun 2024 11:15:04 GMT
- Title: TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
- Authors: Shaolei Zhang, Tian Yu, Yang Feng,
- Abstract summary: Large Language Models (LLMs) sometimes produce untruthful responses despite knowing the correct knowledge.
We propose TruthX, an inference-time intervention method to activate the truthfulness of LLM.
- Score: 31.769428095250912
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) sometimes suffer from producing hallucinations, especially LLMs may generate untruthful responses despite knowing the correct knowledge. Activating the truthfulness within LLM is the key to fully unlocking LLM's knowledge potential. In this paper, we propose TruthX, an inference-time intervention method to activate the truthfulness of LLM by identifying and editing the features within LLM's internal representations that govern the truthfulness. TruthX employs an auto-encoder to map LLM's representations into semantic and truthful latent spaces respectively, and applies contrastive learning to identify a truthful editing direction within the truthful space. During inference, by editing LLM's internal representations in truthful space, TruthX effectively enhances the truthfulness of LLM. Experiments show that TruthX improves the truthfulness of 13 advanced LLMs by an average of 20% on TruthfulQA benchmark. Further analyses suggest that TruthX can control LLM to produce truthful or hallucinatory responses via editing only one vector in LLM's internal representations.
Related papers
- AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents [27.10147264744531]
We study how language agents navigate scenarios with utility-truthfulness conflicts in a multi-turn interactive setting.
We develop a truthfulness detector inspired by psychological literature to assess the agents' responses.
Our experiment demonstrates that all models are truthful less than 50% of the time, although truthfulness and goal achievement (utility) rates vary across models.
arXiv Detail & Related papers (2024-09-13T17:41:12Z) - Truth is Universal: Robust Detection of Lies in LLMs [18.13311575803723]
Large Language Models (LLMs) have revolutionised natural language processing, exhibiting impressive human-like capabilities.
In this work, we aim to develop a robust method to detect when an LLM is lying.
We demonstrate the existence of a two-dimensional subspace, along which the activation vectors of true and false statements can be separated.
This finding is universal and holds for various LLMs, including Gemma-7B, LLaMA2-13B, Mistral-7B and LLaMA3-8B.
Our analysis explains the generalisation failures observed in previous studies and sets the stage for more
arXiv Detail & Related papers (2024-07-03T13:01:54Z) - Scaling Laws for Fact Memorization of Large Language Models [67.94080978627363]
We analyze the scaling laws for Large Language Models' fact knowledge and their behaviors of memorizing different types of facts.
We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs.
Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.
arXiv Detail & Related papers (2024-06-22T03:32:09Z) - Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias.
We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics.
We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential.
We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z) - FLAME: Factuality-Aware Alignment for Large Language Models [86.76336610282401]
The conventional alignment process fails to enhance the factual accuracy of large language models (LLMs)
We identify factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL)
We propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization.
arXiv Detail & Related papers (2024-05-02T17:54:54Z) - Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts [31.769428095250912]
Large Language Models (LLMs) are easily misled by untruthful contexts provided by users or knowledge augmentation tools.
We propose Truth-Aware Context Selection (TACS) to adaptively recognize and mask untruthful context from the inputs.
We show that TACS can effectively filter untruthful context and significantly improve the overall quality of LLMs' responses when presented with misleading information.
arXiv Detail & Related papers (2024-03-12T11:40:44Z) - LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis [11.712916673150245]
Large Language Models (LLMs) produce outputs that diverge from factual reality.
This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice.
In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection.
arXiv Detail & Related papers (2023-12-27T01:44:47Z) - Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks.
How do we evaluate the capabilities of LLMs to consistently produce factually correct answers?
We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z) - Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks.
We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio.
Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z) - DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models [79.01926242857613]
Large language models (LLMs) are prone to hallucinations, generating content that deviates from facts seen during pretraining.
We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs.
We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts.
arXiv Detail & Related papers (2023-09-07T17:45:31Z) - The Internal State of an LLM Knows When It's Lying [18.886091925252174]
Large Language Models (LLMs) have shown exceptional performance in various tasks.
One of their most prominent drawbacks is generating inaccurate or false information with a confident tone.
We provide evidence that the LLM's internal state can be used to reveal the truthfulness of statements.
arXiv Detail & Related papers (2023-04-26T02:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.