Related papers: MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning

MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning

URL: http://arxiv.org/abs/2507.10898v1
Date: Tue, 15 Jul 2025 01:25:04 GMT
Title: MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning
Authors: Jugal Gajjar, Kamalasankari Subramaniakuppusamy, Noha El Kachach,
Abstract summary: MalCodeAI is a language-agnostic pipeline for autonomous code security analysis and remediation.<n>It combines code decomposition and semantic reasoning using finetuned Qwen2.5-Coder-3B-Instruct models.<n>MalCodeAI supports red-hat-style exploit tracing, CVSS-based risk scoring, and zero-shot generalization to detect complex, zero-day vulnerabilities.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The growing complexity of cyber threats and the limitations of traditional vulnerability detection tools necessitate novel approaches for securing software systems. We introduce MalCodeAI, a language-agnostic, multi-stage AI pipeline for autonomous code security analysis and remediation. MalCodeAI combines code decomposition and semantic reasoning using fine-tuned Qwen2.5-Coder-3B-Instruct models, optimized through Low-Rank Adaptation (LoRA) within the MLX framework, and delivers scalable, accurate results across 14 programming languages. In Phase 1, the model achieved a validation loss as low as 0.397 for functional decomposition and summarization of code segments after 200 iterations, 6 trainable layers, and a learning rate of 2 x 10^(-5). In Phase 2, for vulnerability detection and remediation, it achieved a best validation loss of 0.199 using the same number of iterations and trainable layers but with an increased learning rate of 4 x 10^(-5), effectively identifying security flaws and suggesting actionable fixes. MalCodeAI supports red-hat-style exploit tracing, CVSS-based risk scoring, and zero-shot generalization to detect complex, zero-day vulnerabilities. In a qualitative evaluation involving 15 developers, the system received high scores in usefulness (mean 8.06/10), interpretability (mean 7.40/10), and readability of outputs (mean 7.53/10), confirming its practical value in real-world development workflows. This work marks a significant advancement toward intelligent, explainable, and developer-centric software security solutions.

Related papers

Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection [0.0]
Three industry-standard rule-based static code-analysis tools (Sonar, CodeQL and Snyk Code) and three state-of-the-art large language models hosted on the GitHub Models platform (GPT-4.1, Mistral Large and DeepSeek V3) were evaluated.<n>Using a curated suite of ten real-world C# projects that embed 63 vulnerabilities, we measure classical accuracy (precision, recall, F-score), analysis latency, granularity and the developer effort required to vet true positives.<n>We recommend a hybrid pipeline: employ language models early in development for broad, context-aware detection and
arXiv Detail & Related papers (2025-08-06T13:48:38Z)
LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models [2.891351178680099]
This paper presents a novel framework integrating Code Property Graphs (CPG) with Large Language Models (LLM) for robust vulnerability detection.<n>Our approach's ability to provide a more concise and accurate representation of code snippets enables the analysis of larger code segments.<n> Empirical evaluation demonstrates LLMxCPG's effectiveness across verified datasets, achieving 15-40% improvements in F1-score over state-of-the-art baselines.
arXiv Detail & Related papers (2025-07-22T13:36:33Z)
A Mixture of Linear Corrections Generates Secure Code [20.94236753015922]
Large language models (LLMs) have become proficient at sophisticated code-generation tasks, yet remain ineffective at reliably detecting or avoiding code vulnerabilities.<n>We find that current LLMs encode precise internal representations that distinguish vulnerable from secure code.<n>We develop an inference-time steering technique that subtly modulates the model's token-generation probabilities through a mixture of corrections.
arXiv Detail & Related papers (2025-07-13T06:27:33Z)
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.<n>Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.<n> Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z)
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets [8.977790462534152]
We propose DePA, a novel line-level detection and cleansing method tailored to the structural properties of code.<n>DePA significantly outperforms existing methods, achieving 0.14-0.19 improvement in detection F1-score and a 44-65% increase in poisoned segment localization precision.
arXiv Detail & Related papers (2025-02-27T16:30:00Z)
StagedVulBERT: Multi-Granular Vulnerability Detection with a Novel Pre-trained Code Model [13.67394549308693]
This study introduces StagedVulBERT, a novel vulnerability detection framework. CodeBERT-HLS component is designed to capture semantics at both the token and statement levels simultaneously. In coarse-grained vulnerability detection, StagedVulBERT achieves an F1 score of 92.26%, marking a 6.58% improvement over the best-performing methods.
arXiv Detail & Related papers (2024-10-08T07:46:35Z)
Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features. This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks. The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z)
M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization. Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z)
DeVAIC: A Tool for Security Assessment of AI-generated Code [5.383910843560784]
DeVAIC (Detection of Vulnerabilities in AI-generated Code) is a tool to evaluate the security of AI-generated Python code.
arXiv Detail & Related papers (2024-04-11T08:27:23Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.