De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding
- URL: http://arxiv.org/abs/2401.01701v3
- Date: Wed, 19 Jun 2024 07:54:15 GMT
- Title: De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding
- Authors: Aryaz Eghbali, Michael Pradel,
- Abstract summary: Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks.
LLMs are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs.
This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references.
- Score: 18.129031749321058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that predictions by LLMs often resemble the desired code, but they fail to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the model's initial predictions and adds these references into the prompt. Unlike retrieval-augmented generation (RAG), our approach uses the initial prediction(s) by the model to iteratively retrieve increasingly suitable API references. Our evaluation applies the approach to two tasks: predicting API usages in Python and generating tests in JavaScript. We show that De-Hallucinator consistently improves the generated code across five LLMs. In particular, the approach improves the edit distance by 23.3-50.6% and the recall of correctly predicted API usages by 23.9-61.0% for code completion, and improves the number of fixed tests that initially failed because of hallucinations by 63.2%, resulting in a 15.5% increase in statement coverage for test generation.
Related papers
- ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.
We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation [50.73561815838431]
Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena.
We propose a novel dynamic correction decoding method for MLLMs (DeCo)
We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines.
arXiv Detail & Related papers (2024-10-15T16:57:44Z) - LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion [13.633501449498402]
Decomposing API usage is a problem in large language models (LLMs)-based code completion.
This study involved seven advanced LLMs, 145 API mappings from eight popular Python libraries, and 28,125 completion prompts.
We propose two lightweight fixing approaches, REPLACEAPI and INSERTPROMPT, which can serve as baseline approaches for future research.
arXiv Detail & Related papers (2024-06-14T08:44:10Z) - CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification [73.66920648926161]
We introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification.
We present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations.
We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations.
arXiv Detail & Related papers (2024-04-30T23:56:38Z) - (Why) Is My Prompt Getting Worse? Rethinking Regression Testing for
Evolving LLM APIs [8.403074015356594]
Large Language Models (LLMs) are increasingly integrated into software applications.
LLMs are often updated silently and scheduled to be deprecated.
This can cause performance regression and affect prompt design choices.
arXiv Detail & Related papers (2023-11-18T17:11:12Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - On the Effectiveness of Pretrained Models for API Learning [8.788509467038743]
Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc.
Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner.
Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
arXiv Detail & Related papers (2022-04-05T20:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.