Related papers: Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

URL: http://arxiv.org/abs/2505.17656v2
Date: Thu, 29 May 2025 06:51:44 GMT
Title: Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
Authors: Hexiang Tan, Fei Sun, Sha Liu, Du Su, Qi Cao, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng,
Abstract summary: This work formally defines self-consistent errors and evaluates mainstream detection methods on them.<n>All four types of detection methshods significantly struggle to detect self-consistent errors.<n>Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross-model probe method.
Score: 61.12688072239607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness. However, existing detection methods often overlook a critical problem we term as self-consistent error, where LLMs repeatly generate the same incorrect response across multiple stochastic samples. This work formally defines self-consistent errors and evaluates mainstream detection methods on them. Our investigation reveals two key findings: (1) Unlike inconsistent errors, whose frequency diminishes significantly as LLM scale increases, the frequency of self-consistent errors remains stable or even increases. (2) All four types of detection methshods significantly struggle to detect self-consistent errors. These findings reveal critical limitations in current detection methods and underscore the need for improved methods. Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross-model probe method that fuses hidden state evidence from an external verifier LLM. Our method significantly enhances performance on self-consistent errors across three LLM families.

Related papers

Probabilistic Soundness Guarantees in LLM Reasoning Chains [39.228405100824695]
Autoregressive Reasoning Entailment Stability (ARES) is a novel probabilistic framework that prevents error propagation by judging each claim based only on previously-assessed sound premises.<n>ARES achieves state-of-the-art performance across four benchmarks and demonstrates superior robustness on very long synthetic reasoning chains.
arXiv Detail & Related papers (2025-07-17T09:40:56Z)
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs [0.0]
Self-correction is an important capability for large language models (LLMs)<n>While LLMs can identify error in user input, they exhibit a systematic 'Self-Correction Blind Spot'<n>Testing 14 models, we find an average 64.5% blind spot rate.<n>Remarkably, simply appending "Wait" reduces blind spots by 89.3%, suggesting that the capability exists but requires activation.
arXiv Detail & Related papers (2025-07-03T16:41:30Z)
Seeing What's Not There: Spurious Correlation in Multimodal LLMs [47.651861502104715]
We introduce SpurLens, a pipeline that automatically identifies spurious visual cues without human supervision.<n>Our findings reveal that spurious correlations cause two major failure modes in Multimodal Large Language Models (MLLMs)<n>By exposing the persistence of spurious correlations, our study calls for more rigorous evaluation methods and mitigation strategies to enhance the reliability of MLLMs.
arXiv Detail & Related papers (2025-03-11T20:53:00Z)
The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It [23.803612556616685]
We present a mechanistic analysis of error detection in large language models (LLMs)<n>Through circuit analysis, we identify the computational subgraphs responsible for detecting arithmetic errors across four smaller-sized LLMs.<n>Our findings reveal that all models heavily rely on $textitconsistency heads$--attention heads that assess surface-level alignment of numerical values in arithmetic solutions.
arXiv Detail & Related papers (2025-02-17T13:00:44Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. Our research investigates the fragility of uncertainty estimation and explores potential attacks. We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z)
Anomaly Detection of Tabular Data Using LLMs [54.470648484612866]
We show that pre-trained large language models (LLMs) are zero-shot batch-level anomaly detectors. We propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies.
arXiv Detail & Related papers (2024-06-24T04:17:03Z)
Evaluation and Improvement of Fault Detection for Large Language Models [30.760472387136954]
This paper investigates the effectiveness of existing fault detection methods for large language models (LLMs) We propose textbfMuCS, a prompt textbfMutation-based prediction textbfConfidence textbfSmoothing framework to boost the fault detection capability of existing methods.
arXiv Detail & Related papers (2024-04-14T07:06:12Z)
Evaluating LLMs at Detecting Errors in LLM Responses [30.645694514606507]
This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs. We use ReaLMistake to evaluate error detectors based on 12 Large Language Models.
arXiv Detail & Related papers (2024-04-04T17:19:47Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning [55.76083560152823]
SelfCheck is a general-purpose zero-shot verification schema for recognizing errors in step-by-step reasoning. We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
arXiv Detail & Related papers (2023-08-01T10:31:36Z)
LM vs LM: Detecting Factual Errors via Cross Examination [22.50837561382647]
We propose a factuality evaluation framework for language models (LMs) Our key idea is that an incorrect claim is likely to result in inconsistency with other claims that the model generates. We empirically evaluate our method on factual claims made by multiple recent LMs on four benchmarks.
arXiv Detail & Related papers (2023-05-22T17:42:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.