Related papers: ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

URL: http://arxiv.org/abs/2410.14748v2
Date: Tue, 22 Oct 2024 07:19:40 GMT
Title: ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
Authors: Kishan Maharaj, Vitobha Munigala, Srikanth G. Tamilselvam, Prince Kumar, Sayandeep Sen, Palani Kodeswaran, Abhijit Mishra, Pushpak Bhattacharyya,
Abstract summary: Large language models (LLMs) are prone to hallucination-outputs that stray from intended meanings. We introduce a first-of-its-kind dataset with $sim$10K samples, curated specifically for hallucination detection in code summarization.
Score: 29.561699707926056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarization. However, LLMs are prone to hallucination-outputs that stray from intended meanings. Detecting hallucinations in code summarization is especially difficult due to the complex interplay between programming and natural languages. We introduce a first-of-its-kind dataset with $\sim$10K samples, curated specifically for hallucination detection in code summarization. We further propose a novel Entity Tracing Framework (ETF) that a) utilizes static program analysis to identify code entities from the program and b) uses LLMs to map and verify these entities and their intents within generated code summaries. Our experimental analysis demonstrates the effectiveness of the framework, leading to a 0.73 F1 score. This approach provides an interpretable method for detecting hallucinations by grounding entities, allowing us to evaluate summary accuracy.

Related papers

HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination" This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
(Im)possibility of Automated Hallucination Detection in Large Language Models [40.13262095901877]
We introduce a theoretical framework to analyze the feasibility of automatically detecting hallucinations produced by large language models (LLMs) We investigate whether an algorithm, trained on examples drawn from an unknown target language, can reliably determine whether the LLM's outputs are correct or constitute hallucinations. We show that the use of expert-labeled feedback, i.e., training the detector with both positive examples (correct statements) and negative examples (explicitly labeled incorrect statements) dramatically changes this conclusion.
arXiv Detail & Related papers (2025-04-23T18:00:07Z)
REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models [15.380441563675243]
REFIND (Retrieval-augmented Factuality hallucINation Detection) is a novel framework that detects hallucinated spans within large language model (LLM) outputs. We propose the Context Sensitivity Ratio (CSR), a novel metric that quantifies the sensitivity of LLM outputs to retrieved evidence. REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models.
arXiv Detail & Related papers (2025-02-19T10:59:05Z)
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models [58.952782707682815]
COFT is a novel method to focus on different-level key texts, thereby avoiding getting lost in lengthy contexts. Experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over $30%$ in the F1 score metric.
arXiv Detail & Related papers (2024-10-19T13:59:48Z)
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation [50.73561815838431]
Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena. We propose a novel dynamic correction decoding method for MLLMs (DeCo) We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines.
arXiv Detail & Related papers (2024-10-15T16:57:44Z)
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code [20.736888384234273]
We introduce Collu-Bench, a benchmark for predicting code hallucinations of large language models (LLMs) Collu-Bench includes 13,234 code hallucination instances collected from five datasets and 11 diverse LLMs, ranging from open-source models to commercial ones. We conduct experiments to predict hallucination on Collu-Bench, using both traditional machine learning techniques and neural networks, which achieves 22.03 -- 33.15% accuracy.
arXiv Detail & Related papers (2024-10-13T20:41:47Z)
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text. LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z)
CodeMirage: Hallucinations in Code Generated by Large Language Models [6.063525456640463]
Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation. LLMs are prone to generate hallucinations, i.e., they generate text which sounds plausible but is incorrect. We propose the first benchmark CodeMirage dataset for code hallucinations.
arXiv Detail & Related papers (2024-08-14T22:53:07Z)
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks. They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z)
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification [73.66920648926161]
We introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification. We present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations. We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations.
arXiv Detail & Related papers (2024-04-30T23:56:38Z)
Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding [36.81476620057058]
Large Vision-Language Models (LVLMs) are susceptible to object hallucinations. Current approaches often rely on the model's token likelihoods or other internal information. We introduce our CLIP-Guided Decoding approach to reduce object hallucination at decoding time.
arXiv Detail & Related papers (2024-02-23T12:57:16Z)
Alleviating Hallucinations of Large Language Models through Induced Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information. We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z)
Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations [3.9566468090516067]
Large language models (LLMs) can generate fluent natural language texts when given relevant documents as background context. LLMs are prone to generate hallucinations that are not supported by the provided sources. We propose a hierarchical framework to detect and mitigate such ungrounded hallucination.
arXiv Detail & Related papers (2023-10-06T00:10:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.