ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
- URL: http://arxiv.org/abs/2410.14748v2
- Date: Tue, 22 Oct 2024 07:19:40 GMT
- Title: ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
- Authors: Kishan Maharaj, Vitobha Munigala, Srikanth G. Tamilselvam, Prince Kumar, Sayandeep Sen, Palani Kodeswaran, Abhijit Mishra, Pushpak Bhattacharyya,
- Abstract summary: Large language models (LLMs) are prone to hallucination-outputs that stray from intended meanings.
We introduce a first-of-its-kind dataset with $sim$10K samples, curated specifically for hallucination detection in code summarization.
- Score: 29.561699707926056
- License:
- Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarization. However, LLMs are prone to hallucination-outputs that stray from intended meanings. Detecting hallucinations in code summarization is especially difficult due to the complex interplay between programming and natural languages. We introduce a first-of-its-kind dataset with $\sim$10K samples, curated specifically for hallucination detection in code summarization. We further propose a novel Entity Tracing Framework (ETF) that a) utilizes static program analysis to identify code entities from the program and b) uses LLMs to map and verify these entities and their intents within generated code summaries. Our experimental analysis demonstrates the effectiveness of the framework, leading to a 0.73 F1 score. This approach provides an interpretable method for detecting hallucinations by grounding entities, allowing us to evaluate summary accuracy.
Related papers
- Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models [58.952782707682815]
COFT is a novel method to focus on different-level key texts, thereby avoiding getting lost in lengthy contexts.
Experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over $30%$ in the F1 score metric.
arXiv Detail & Related papers (2024-10-19T13:59:48Z) - MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation [50.73561815838431]
Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena.
We propose a novel dynamic correction decoding method for MLLMs (DeCo)
We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines.
arXiv Detail & Related papers (2024-10-15T16:57:44Z) - Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code [20.736888384234273]
We introduce Collu-Bench, a benchmark for predicting code hallucinations of large language models (LLMs)
Collu-Bench includes 13,234 code hallucination instances collected from five datasets and 11 diverse LLMs, ranging from open-source models to commercial ones.
We conduct experiments to predict hallucination on Collu-Bench, using both traditional machine learning techniques and neural networks, which achieves 22.03 -- 33.15% accuracy.
arXiv Detail & Related papers (2024-10-13T20:41:47Z) - LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text.
LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z) - CodeMirage: Hallucinations in Code Generated by Large Language Models [6.063525456640463]
Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation.
LLMs are prone to generate hallucinations, i.e., they generate text which sounds plausible but is incorrect.
We propose the first benchmark CodeMirage dataset for code hallucinations.
arXiv Detail & Related papers (2024-08-14T22:53:07Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification [73.66920648926161]
We introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification.
We present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations.
We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations.
arXiv Detail & Related papers (2024-04-30T23:56:38Z) - Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding [36.81476620057058]
Large Vision-Language Models (LVLMs) are susceptible to object hallucinations.
Current approaches often rely on the model's token likelihoods or other internal information.
We introduce our CLIP-Guided Decoding approach to reduce object hallucination at decoding time.
arXiv Detail & Related papers (2024-02-23T12:57:16Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - Chain of Natural Language Inference for Reducing Large Language Model
Ungrounded Hallucinations [3.9566468090516067]
Large language models (LLMs) can generate fluent natural language texts when given relevant documents as background context.
LLMs are prone to generate hallucinations that are not supported by the provided sources.
We propose a hierarchical framework to detect and mitigate such ungrounded hallucination.
arXiv Detail & Related papers (2023-10-06T00:10:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.