Related papers: MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection

MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection

URL: http://arxiv.org/abs/2510.04397v1
Date: Sun, 05 Oct 2025 23:33:26 GMT
Title: MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection
Authors: Van Nguyen, Surya Nepal, Xingliang Yuan, Tingmin Wu, Fengchao Chen, Carsten Rudolph,
Abstract summary: MULVULN is a novel multilingual vulnerability detection approach that learns from source code across multiple languages.<n>It achieves more robust and effective detection of vulnerabilities in real-world multilingual software systems.<n> Notably, MULVULN achieves substantially higher F1-score, with improvements ranging from 1.45% to 23.59% compared to the baseline methods.
Score: 22.197550174544627
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software vulnerabilities (SVs) pose a critical threat to safety-critical systems, driving the adoption of AI-based approaches such as machine learning and deep learning for software vulnerability detection. Despite promising results, most existing methods are limited to a single programming language. This is problematic given the multilingual nature of modern software, which is often complex and written in multiple languages. Current approaches often face challenges in capturing both shared and language-specific knowledge of source code, which can limit their performance on diverse programming languages and real-world codebases. To address this gap, we propose MULVULN, a novel multilingual vulnerability detection approach that learns from source code across multiple languages. MULVULN captures both the shared knowledge that generalizes across languages and the language-specific knowledge that reflects unique coding conventions. By integrating these aspects, it achieves more robust and effective detection of vulnerabilities in real-world multilingual software systems. The rigorous and extensive experiments on the real-world and diverse REEF dataset, consisting of 4,466 CVEs with 30,987 patches across seven programming languages, demonstrate the superiority of MULVULN over thirteen effective and state-of-the-art baselines. Notably, MULVULN achieves substantially higher F1-score, with improvements ranging from 1.45% to 23.59% compared to the baseline methods.

Related papers

Layer-Targeted Multilingual Knowledge Erasure in Large Language Models [15.409568435026015]
We identify intervention depth as the key factor determining multilingual generalization.<n>We propose MUTE, a framework that uses Centered Kernel Alignment (CKA) and Linguistic Regions Development Score (LRDS) to identify intermediate, language-agnostic layers.
arXiv Detail & Related papers (2026-02-26T03:00:07Z)
Large Language Models for Multilingual Vulnerability Detection: How Far Are We? [13.269680075539135]
We evaluate the effectiveness of pre-trained language models (PLMs) and large language models (LLMs) for multilingual vulnerability detection.<n>Using over 30,000 real-world vulnerability-fixing patches across seven programming languages, we assess model performance at both the function-level and line-level.<n>Our key findings indicate that GPT-4o, enhanced through instruction tuning and few-shot prompting, significantly outperforms all other evaluated models.
arXiv Detail & Related papers (2025-06-09T07:27:49Z)
Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation [48.07804537257056]
Multi-lingual RACG systems are valuable for migrating code-bases across programming languages.<n>We construct a dataset spanning 13 PLs with nearly 14k instances to explore utility and robustness of multi-lingual RACG systems.
arXiv Detail & Related papers (2025-06-04T03:31:00Z)
MMATH: A Multilingual Benchmark for Mathematical Reasoning [94.05289799605957]
We introduce MMATH, a benchmark for multilingual complex reasoning spanning 374 high-quality math problems across 10 typologically diverse languages.<n>We observe that even advanced models like DeepSeek R1 exhibit substantial performance disparities across languages and suffer from a critical off-target issue-generating responses in unintended languages.<n>Our findings offer new insights and practical strategies for advancing the multilingual reasoning capabilities of large language models.
arXiv Detail & Related papers (2025-05-25T12:47:39Z)
A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection [13.269680075539135]
Large language models (LLMs) offer language-agnostic capabilities and enhanced semantic understanding.<n>Recent advancements in large language models (LLMs) offer language-agnostic capabilities and enhanced semantic understanding.<n>Our findings reveal that the PLM CodeT5P achieves the best performance in multilingual vulnerability detection.
arXiv Detail & Related papers (2025-05-12T09:19:31Z)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
MVD: A Multi-Lingual Software Vulnerability Detection Framework [1.0771072841012608]
We introduce MVD - an innovative multi-lingual vulnerability detection framework.<n>This framework acquires the ability to detect vulnerabilities across multiple languages by concurrently learning from vulnerability data of various languages.<n>Our framework significantly surpasses state-of-the-art methods in multi-lingual vulnerability detection by 83.7% to 193.6% in PR-AUC.
arXiv Detail & Related papers (2024-12-09T02:58:10Z)
Large Language Models for Secure Code Assessment: A Multi-Language Empirical Study [1.9116784879310031]
We show that GPT-4o achieves the highest vulnerability detection and CWE classification scores using a few-shot setting. We develop a library called CODEGUARDIAN integrated with VSCode which enables developers to perform LLM-assisted real-time vulnerability analysis.
arXiv Detail & Related papers (2024-08-12T18:10:11Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs) Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages. We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z)
Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels. We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z)
Security Vulnerability Detection Using Deep Learning Natural Language Processing [1.4591078795663772]
We model software vulnerability detection as a natural language processing (NLP) problem with source code treated as texts. For training and testing, we have built a dataset of over 100,000 files in $C$ programming language with 123 types of vulnerabilities. Experiments generate the best performance of over 93% accuracy in detecting security vulnerabilities.
arXiv Detail & Related papers (2021-05-06T01:28:21Z)
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge. However, studies on LMs' factual representation ability have almost invariably been performed on English. We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.