Transformer-Based Language Models for Software Vulnerability Detection:
Performance, Model's Security and Platforms
- URL: http://arxiv.org/abs/2204.03214v1
- Date: Thu, 7 Apr 2022 04:57:42 GMT
- Title: Transformer-Based Language Models for Software Vulnerability Detection:
Performance, Model's Security and Platforms
- Authors: Chandra Thapa and Seung Ick Jang and Muhammad Ejaz Ahmed and Seyit
Camtepe and Josef Pieprzyk and Surya Nepal
- Abstract summary: We study how good are the large transformer-based language models detecting software vulnerabilities.
We perform the model's security check using Microsoft's Counterfit, a command-line tool.
We present our recommendation while choosing the platforms to run these large models.
- Score: 21.943263073426646
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The large transformer-based language models demonstrate excellent performance
in natural language processing. By considering the closeness of natural
languages to the high-level programming language such as C/C++, this work
studies how good are the large transformer-based language models detecting
software vulnerabilities. Our results demonstrate the well performance of these
models on software vulnerability detection. The answer enables extending
transformer-based language models to vulnerability detection and leveraging
superior performance beyond the natural language processing domain. Besides, we
perform the model's security check using Microsoft's Counterfit, a command-line
tool to assess the model's security. Our results find that these models are
vulnerable to adversarial examples. In this regard, we present a simple
countermeasure and its result. Experimenting with large models is always a
challenge due to the requirement of computing resources and platforms/libraries
& dependencies. Based on the experiences and difficulties we faced during this
work, we present our recommendation while choosing the platforms to run these
large models. Moreover, the popular platforms are surveyed thoroughly in this
paper.
Related papers
- Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks [4.459306403129608]
Large Language Models (LLMs) are increasingly becoming the preferred foundation platforms for many Natural Language Processing tasks.
Their generality opens them up to subversion by end users who may embed into their requests instructions that cause the model to behave in unauthorized and possibly unsafe ways.
We study these Prompt Injection Attacks (PIAs) on multiple families of LLMs on a Machine Translation task, focusing on the effects of model size on the attack success rates.
arXiv Detail & Related papers (2024-03-14T19:39:10Z) - Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles [2.134057414078079]
Large Language Models (LLMs) gain widespread use, ensuring their security and robustness is critical.
This paper presents a novel study focusing on exploitation of such large language models against deceptive interactions.
Our results demonstrate a significant finding in that these large language models are susceptible to deception and social engineering attacks.
arXiv Detail & Related papers (2023-11-24T23:57:44Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - Augmentation Invariant Discrete Representation for Generative Spoken
Language Modeling [41.733860809136196]
We propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling.
The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme.
We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines.
arXiv Detail & Related papers (2022-09-30T14:15:03Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.