Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models
through Intervention without Tuning
- URL: http://arxiv.org/abs/2312.17484v2
- Date: Sun, 14 Jan 2024 07:55:12 GMT
- Title: Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models
through Intervention without Tuning
- Authors: Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang,
Di Wang, Cheng-Zhong Xu
- Abstract summary: We introduce Truth Forest, a method that enhances truthfulness in large language models (LLMs)
We also introduce Random Peek, a systematic technique considering an extended range of positions within the sequence.
- Score: 18.92421817900689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the great success of large language models (LLMs) in various tasks,
they suffer from generating hallucinations. We introduce Truth Forest, a method
that enhances truthfulness in LLMs by uncovering hidden truth representations
using multi-dimensional orthogonal probes. Specifically, it creates multiple
orthogonal bases for modeling truth by incorporating orthogonal constraints
into the probes. Moreover, we introduce Random Peek, a systematic technique
considering an extended range of positions within the sequence, reducing the
gap between discerning and generating truth features in LLMs. By employing this
approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on
TruthfulQA. Likewise, significant improvements are observed in fine-tuned
models. We conducted a thorough analysis of truth features using probes. Our
visualization results show that orthogonal probes capture complementary
truth-related features, forming well-defined clusters that reveal the inherent
structure of the dataset.
Related papers
- On the Universal Truthfulness Hyperplane Inside LLMs [27.007142483859162]
We investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model.
Our results indicate that increasing the diversity of the training datasets significantly enhances the performance in all scenarios.
arXiv Detail & Related papers (2024-07-11T15:07:26Z) - Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression [19.69104070561701]
Large language models (LLMs) can generate long-form and coherent text, yet they often hallucinate facts.
We propose LITO, a Learnable Intervention method for Truthfulness Optimization.
Experiments on multiple LLMs and question-answering datasets demonstrate that LITO improves truthfulness while preserving task accuracy.
arXiv Detail & Related papers (2024-05-01T03:50:09Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - GRATH: Gradual Self-Truthifying for Large Language Models [63.502835648056305]
GRAdual self-truTHifying (GRATH) is a novel post-processing method to enhance truthfulness of large language models (LLMs)
GRATH iteratively refines truthfulness data and updates the model, leading to a gradual improvement in model truthfulness in a self-supervised manner.
GRATH achieves state-of-the-art performance on TruthfulQA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs.
arXiv Detail & Related papers (2024-01-22T19:00:08Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets [6.732432949368421]
Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods.
Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations.
We present evidence that at sufficient scale, LLMs linearly represent the truth or falsehood of factual statements.
arXiv Detail & Related papers (2023-10-10T17:54:39Z) - DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models [79.01926242857613]
Large language models (LLMs) are prone to hallucinations, generating content that deviates from facts seen during pretraining.
We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs.
We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts.
arXiv Detail & Related papers (2023-09-07T17:45:31Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models.
Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency.
We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.