Related papers: SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

URL: http://arxiv.org/abs/2510.04398v1
Date: Sun, 05 Oct 2025 23:44:54 GMT
Title: SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Authors: Buyun Liang, Liangzu Peng, Jinqi Luo, Darshan Thaker, Kwan Ho Ryan Chan, René Vidal,
Abstract summary: Large Language Models (LLMs) are increasingly deployed in high-risk domains.<n>LLMs often produce hallucinations, raising serious concerns about their reliability.<n>We propose Semantically Equivalent and Coherent Attacks (SECA) to elicit hallucinations.
Score: 47.0190003379175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in high-risk domains. However, state-of-the-art LLMs often produce hallucinations, raising serious concerns about their reliability. Prior work has explored adversarial attacks for hallucination elicitation in LLMs, but it often produces unrealistic prompts, either by inserting gibberish tokens or by altering the original meaning. As a result, these approaches offer limited insight into how hallucinations may occur in practice. While adversarial attacks in computer vision often involve realistic modifications to input images, the problem of finding realistic adversarial prompts for eliciting LLM hallucinations has remained largely underexplored. To address this gap, we propose Semantically Equivalent and Coherent Attacks (SECA) to elicit hallucinations via realistic modifications to the prompt that preserve its meaning while maintaining semantic coherence. Our contributions are threefold: (i) we formulate finding realistic attacks for hallucination elicitation as a constrained optimization problem over the input prompt space under semantic equivalence and coherence constraints; (ii) we introduce a constraint-preserving zeroth-order method to effectively search for adversarial yet feasible prompts; and (iii) we demonstrate through experiments on open-ended multiple-choice question answering tasks that SECA achieves higher attack success rates while incurring almost no constraint violations compared to existing methods. SECA highlights the sensitivity of both open-source and commercial gradient-inaccessible LLMs to realistic and plausible prompt variations. Code is available at https://github.com/Buyun-Liang/SECA.

Related papers

HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing [6.021803204524807]
Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities.<n>LVLMs are prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information.<n>We propose Hallucination Insensitivity Model Editing (HIME), a layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations.
arXiv Detail & Related papers (2026-02-21T04:16:17Z)
Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors [8.089908150148554]
Multimodal large language models (MLLMs) have achieved remarkable success across diverse vision-language tasks.<n>MLLMs are highly susceptible to hallucinations, producing content that is fluent but inconsistent with visual evidence.<n>We propose a training-free, self-supervised method for hallucination mitigation.
arXiv Detail & Related papers (2025-09-26T07:24:28Z)
SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs [52.03164192840023]
Large Vision-Language Models (LVLMs) still suffer from hallucinations, i.e., generating content inconsistent with input or established world knowledge.<n>We propose an automated data construction pipeline that produces scalable, controllable, and diverse evaluation data.<n>We construct SHALE, a benchmark designed to assess both faithfulness and factuality hallucinations.
arXiv Detail & Related papers (2025-08-13T07:58:01Z)
MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them [52.764019220214344]
Hallucinations pose critical risks for large language model (LLM)-based agents.<n>We present MIRAGE-Bench, the first unified benchmark for eliciting and evaluating hallucinations in interactive environments.
arXiv Detail & Related papers (2025-07-28T17:38:29Z)
Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation [38.43656456659151]
Large vision-language models (LVLMs) have achieved remarkable performance on multimodal tasks.<n>They still suffer from hallucinations, generating text inconsistent with visual input, posing significant risks in real-world applications.<n>We propose Steering LVLMs via SAE Latent Directions (SSL), a plug-and-play method based on SAE-derived latent directions to mitigate hallucinations in LVLMs.
arXiv Detail & Related papers (2025-05-22T02:45:45Z)
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models [3.9464481148889354]
We propose a novel decoding mechanism, Decoding with Inter-layer Consistency via Layer Aggregation (DCLA)<n>Our approach constructs a dynamic semantic reference by aggregating representations from previous layers, and corrects semantically deviated layers to enforce inter-layer consistency.<n> Experiments on hallucination benchmarks such as MME and POPE demonstrate that DCLA effectively reduces hallucinations while enhancing the reliability and performance of LVLMs.
arXiv Detail & Related papers (2025-05-18T10:15:42Z)
Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models [0.0]
Hallucinations in large language models (LLMs) present a growing challenge across real-world applications.<n>We propose a prompt-based framework to systematically trigger and quantify hallucination.
arXiv Detail & Related papers (2025-05-01T14:33:47Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning [151.4060202671114]
multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing vision-language tasks.<n>This paper introduces a novel bottom-up reasoning framework to address hallucinations in MLLMs.<n>Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge.
arXiv Detail & Related papers (2024-12-15T09:10:46Z)
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models [124.90671698586249]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.<n>LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.