Related papers: Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms

Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms

URL: http://arxiv.org/abs/2204.03214v1
Date: Thu, 7 Apr 2022 04:57:42 GMT
Title: Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms
Authors: Chandra Thapa and Seung Ick Jang and Muhammad Ejaz Ahmed and Seyit Camtepe and Josef Pieprzyk and Surya Nepal
Abstract summary: We study how good are the large transformer-based language models detecting software vulnerabilities. We perform the model's security check using Microsoft's Counterfit, a command-line tool. We present our recommendation while choosing the platforms to run these large models.
Score: 21.943263073426646
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the closeness of natural languages to the high-level programming language such as C/C++, this work studies how good are the large transformer-based language models detecting software vulnerabilities. Our results demonstrate the well performance of these models on software vulnerability detection. The answer enables extending transformer-based language models to vulnerability detection and leveraging superior performance beyond the natural language processing domain. Besides, we perform the model's security check using Microsoft's Counterfit, a command-line tool to assess the model's security. Our results find that these models are vulnerable to adversarial examples. In this regard, we present a simple countermeasure and its result. Experimenting with large models is always a challenge due to the requirement of computing resources and platforms/libraries & dependencies. Based on the experiences and difficulties we faced during this work, we present our recommendation while choosing the platforms to run these large models. Moreover, the popular platforms are surveyed thoroughly in this paper.

Related papers

PL-Guard: Benchmarking Language Model Safety for Polish [43.39208658482427]
We introduce a manually annotated benchmark dataset for language model safety classification in Polish.<n>We also create adversarially perturbed variants of these samples designed to challenge model robustness.<n>We train these models using different combinations of annotated data and evaluate their performance, comparing it against publicly available guard models.
arXiv Detail & Related papers (2025-06-19T13:56:41Z)
Deep Contrastive Unlearning for Language Models [9.36216515987051]
We propose a machine unlearning framework, named Deep Contrastive Unlearning for fine-Tuning (DeepCUT) language models. Our proposed model achieves machine unlearning by directly optimizing the latent space of a model.
arXiv Detail & Related papers (2025-03-19T04:58:45Z)
Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks [4.459306403129608]
Large Language Models (LLMs) are increasingly becoming the preferred foundation platforms for many Natural Language Processing tasks. Their generality opens them up to subversion by end users who may embed into their requests instructions that cause the model to behave in unauthorized and possibly unsafe ways. We study these Prompt Injection Attacks (PIAs) on multiple families of LLMs on a Machine Translation task, focusing on the effects of model size on the attack success rates.
arXiv Detail & Related papers (2024-03-14T19:39:10Z)
Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles [2.134057414078079]
Large Language Models (LLMs) gain widespread use, ensuring their security and robustness is critical. This paper presents a novel study focusing on exploitation of such large language models against deceptive interactions. Our results demonstrate a significant finding in that these large language models are susceptible to deception and social engineering attacks.
arXiv Detail & Related papers (2023-11-24T23:57:44Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling [41.733860809136196]
We propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines.
arXiv Detail & Related papers (2022-09-30T14:15:03Z)
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models. It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.