Related papers: PackMonitor: Enabling Zero Package Hallucinations Through Decoding-Time Monitoring

PackMonitor: Enabling Zero Package Hallucinations Through Decoding-Time Monitoring

URL: http://arxiv.org/abs/2602.20717v1
Date: Tue, 24 Feb 2026 09:26:11 GMT
Title: PackMonitor: Enabling Zero Package Hallucinations Through Decoding-Time Monitoring
Authors: Xiting Liu, Yuetong Liu, Yitong Zhang, Jia Li, Shi-Min Hu,
Abstract summary: We argue that package hallucinations are theoretically preventable based on the key insight that package validity is decidable through finite and enumerable authoritative package lists.<n>We propose PackMonitor, the first approach capable of fundamentally eliminating package hallucinations by continuously monitoring the model's decoding process and intervening when necessary.
Score: 14.864903095382937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) are increasingly integrated into software development workflows, their trustworthiness has become a critical concern. However, in dependency recommendation scenarios, the reliability of LLMs is undermined by widespread package hallucinations, where models often recommend hallucinated packages. Recent studies have proposed a range of approaches to mitigate this issue. Nevertheless, existing approaches typically merely reduce hallucination rates rather than eliminate them, leaving persistent software security risks. In this work, we argue that package hallucinations are theoretically preventable based on the key insight that package validity is decidable through finite and enumerable authoritative package lists. Building on this, we propose PackMonitor, the first approach capable of fundamentally eliminating package hallucinations by continuously monitoring the model's decoding process and intervening when necessary. To implement this in practice, PackMonitor addresses three key challenges: (1) determining when to trigger intervention via a Context-Aware Parser that continuously monitors model outputs and selectively activates intervening only during installation command generation; (2) resolving how to intervene by employing a Package-Name Intervenor that strictly limits the decoding space to an authoritative package list; and (3) ensuring monitoring efficiency through a DFA-Caching Mechanism that enables scalability to millions of packages with negligible overhead. Extensive experiments on five widely used LLMs demonstrate that PackMonitor is a training-free, plug-and-play solution that consistently reduces package hallucination rates to zero while maintaining low-latency inference and preserving original model capabilities.

Related papers

Secure or Suspect? Investigating Package Hallucinations of Shell Command in Original and Quantized LLMs [7.21976012124109]
We conduct the first systematic empirical study of the impact of quantization on package hallucination and vulnerability risks in Go packages.<n>Our results show that quantization substantially increases the package hallucination rate (PHR), with 4-bit models exhibiting the most severe degradation.<n>Our analysis of hallucinated outputs reveals that most fabricated packages resemble realistic URL-based Go module paths.
arXiv Detail & Related papers (2025-12-09T03:47:31Z)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z)
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models [67.15793594651609]
Traditional safety monitors require the same amount of compute for every query.<n>We introduce Truncated Polynomials (TPCs), a natural extension of linear probes for dynamic activation monitoring.<n>Our key insight is that TPCs can be trained and evaluated progressively, term-by-term.
arXiv Detail & Related papers (2025-09-30T13:32:59Z)
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents [52.92354372596197]
Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities.<n>This interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior.<n>We propose a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control and data-level constraints.
arXiv Detail & Related papers (2025-06-13T05:01:09Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Backdoor Cleaning without External Guidance in MLLM Fine-tuning [76.82121084745785]
Believe Your Eyes (BYE) is a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples.<n>It achieves near-zero attack success rates while maintaining clean-task performance.
arXiv Detail & Related papers (2025-05-22T17:11:58Z)
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities [11.868859925111561]
Large Language Models (LLMs) have become an essential tool in the programmer's toolkit.<n>Their tendency to hallucinate code can be used by malicious actors to introduce vulnerabilities to broad swathes of the software supply chain.
arXiv Detail & Related papers (2025-01-31T10:26:18Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection.<n>To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities.<n>Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs [3.515912713354746]
Package hallucinations arise from fact-conflicting errors when generating code using Large Language Models.<n>This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages.<n>We show that the average percentage of hallucinated packages is at least 5.2% for commercial models and 21.7% for open-source models.
arXiv Detail & Related papers (2024-06-12T03:29:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.