Related papers: Semantically Diverse Language Generation for Uncertainty Estimation in Language Models

Semantically Diverse Language Generation for Uncertainty Estimation in Language Models

URL: http://arxiv.org/abs/2406.04306v1
Date: Thu, 6 Jun 2024 17:53:34 GMT
Title: Semantically Diverse Language Generation for Uncertainty Estimation in Language Models
Authors: Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, Sepp Hochreiter,
Abstract summary: Large language models (LLMs) can suffer from hallucinations when generating text. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. We introduce Semantically Diverse Language Generation to quantify predictive uncertainty in LLMs.
Score: 5.8034373350518775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that hallucinations stem from predictive uncertainty. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.

Related papers

On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text? [8.484462568964682]
There is no consistent or precise definition of their target, namely "LLM-generated text"<n>What is commonly regarded as the detecting target usually represents only a subset of the text that LLMs can potentially produce.<n>Existing benchmarks and evaluation approaches do not adequately address the various conditions in real-world detector applications.
arXiv Detail & Related papers (2025-10-23T17:59:06Z)
Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation [66.84286617519258]
Large language models are transforming social science research by enabling the automation of labor-intensive tasks like data annotation and text analysis.<n>Such variation can introduce systematic biases and random errors, which propagate to downstream analyses and cause Type I (false positive), Type II (false negative), Type S (wrong sign), or Type M (exaggerated effect) errors.<n>We find that intentional LLM hacking is strikingly simple. By replicating 37 data annotation tasks from 21 published social science studies, we show that, with just a handful of prompt paraphrases, virtually anything can be presented as statistically significant.
arXiv Detail & Related papers (2025-09-10T17:58:53Z)
Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation? [7.416552590139255]
We evaluate a suite of open-access LLMs on their ability to detect intrinsic hallucinations in two conditional generation tasks. We study how model performance varies across tasks and language. We find that performance varies across models but is consistent across prompts.
arXiv Detail & Related papers (2025-04-29T12:30:05Z)
VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation [18.873512856021357]
We introduce VL-Uncertainty, the first uncertainty-based framework for detecting hallucinations in large vision-language models. We measure uncertainty by analyzing the prediction variance across semantically equivalent but perturbed prompts. When LVLMs are highly confident, they provide consistent responses to semantically equivalent queries. But, when uncertain, the responses of the target LVLM become more random.
arXiv Detail & Related papers (2024-11-18T04:06:04Z)
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text. LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z)
CLUE: Concept-Level Uncertainty Estimation for Large Language Models [49.92690111618016]
We propose a novel framework for Concept-Level Uncertainty Estimation for Large Language Models (LLMs) We leverage LLMs to convert output sequences into concept-level representations, breaking down sequences into individual concepts and measuring the uncertainty of each concept separately. We conduct experiments to demonstrate that CLUE can provide more interpretable uncertainty estimation results compared with sentence-level uncertainty.
arXiv Detail & Related papers (2024-09-04T18:27:12Z)
Robustness of LLMs to Perturbations in Text [2.0670689746336]
Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data? This work tackles this critical question by investigating LLMs' resilience against morphological variations in text. Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text.
arXiv Detail & Related papers (2024-07-12T04:50:17Z)
CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs [1.515687944002438]
We propose Contrastive Semantic Similarity, a module to obtain similarity features for measuring uncertainty for text pairs. We conduct extensive experiments with three large language models (LLMs) on several benchmark question-answering datasets. Results show that our proposed method performs better in estimating reliable responses of LLMs than comparable baselines.
arXiv Detail & Related papers (2024-06-05T11:35:44Z)
Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations. One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z)
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing [10.20632187568563]
Hallucination has emerged as the most vulnerable aspect of contemporary Large Language Models (LLMs) In this paper, we introduce the Sorry, Come Again (SCA) prompting, aimed to avoid LLM hallucinations. We provide an in-depth analysis of linguistic nuances: formality, readability, and concreteness of prompts for 21 LLMs. We propose an optimal paraphrasing technique to identify the most comprehensible paraphrase of a given prompt.
arXiv Detail & Related papers (2024-03-27T19:45:09Z)
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification [116.77055746066375]
Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. We propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification.
arXiv Detail & Related papers (2024-03-07T17:44:17Z)
Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models [37.63939774027709]
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities. We propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment. Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses.
arXiv Detail & Related papers (2023-05-30T16:31:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.